Experiments with NFS mounted rootfs

This is a piece of my voilas with nfs mounted root directories.

Some History

At our office we started with around 15 people. We mainly deal in work involving processing of claims
which from the application perspective boils down to the browser and a word processor. Luckily, I was
given freedom to handle the technical aspect of the whole setup.

I immediately decided on network booting as the best solution. No hard disks!. No installation chores!.
Instant updates! So I set up a linux box with dhcp,tftp and XDMCP forwarding. All machines had more or
less the same motherboard/configuration, so i compiled all the drivers into (no modules) the latest 2.6
kernel making it monolithic. Then managed to create a rootfs with X server( more or less busybox + X
statically compiled + some libs) in around 20 MB. All machines were equipped with around 128M RAM which
was fairly decent to manage a 20M rootfs. After some tweaks here and there we were ready with network
booting.
To sum up the whole process,

  • machines would boot with the PXEboot options set in their BIOS
  • The machine would then search for a dhcp server
  • The dhcp server responds with a filename directive pxelinux.0 which is a bootloader found in
    syslinux
  • Machine reads the filename and contacts the tftp server to fetch the file
  • Then machine tries to read its pxe configuration file. The filename must be the hex quivalent of
    your ip address. For eg: if the ip is 192.168.0.3 the filename would be C0A80003. if that file is not
    found then the machine will try to find files by deleting the last chracters from the hex ip address.
    That means the file searching sequence would be C0A8000, C0A800, C0A80, C0A8, C0A, C0, C. Finally it
    searches for a file called default and then if not found exits
  • The configuration file specifies the kernel and initrd. These files are downloaded and the kernel
    boots... The initrd in loaded in ram and an inittab entry starts a X server doing XDMCP query to the
    server machine

With a staff of 15, the machines were pretty fast. All processing on the server and just display on the
clients. Soon 10 more joined in and speeds noticably reduced. Another 15 would be joining soon. I
already knew that as people increase this is going to be a big problem. I had thought on the lines of
load balancing, but was not too happy with the idea of increasing servers which would also mean
increasing costs, maintainance and administration. So NFS mounted rootfs was the answer. Basically all
machines would use their own processor and memory but would have no hard disks. This way there should
be absolutely no need for any other server and should not be a problem to handle the increasing staff.

So, soon created a big rootfs with all packages needed, compiled the kernel with root over nfs and
eventually got a machine to boot with root over NFS. Started firefox, and it opened as normally as it
would. Then tried to start openoffice. The first time openoffice takes some time as it copies the setup
to the users directory.... so i patiently wait for around 30 seconds and nothing seems to happen. I
wait for a minute and i know that something is wrong!

Voila - Debugging begins!

I had to basically look for some sort of error messages, so started oowriter in a xterm.

vinay@debdungeon:~# oowriter

Not a single error message and oowriter just does not start, no splash screen, no error messages,
nothing. I delete .sversionrc and .openoffice directory and start oowriter again

vinay@debdungeon:~# rm .sversionrc; rm -fr .openoffice
vinay@debdungeon:~# oowriter
running openoffice.org setup...
Setup complete.  Running openoffice.org...

Now, I can see that it has performed the setup and trying to start oowriter, but nothing more and it
still does not start. I searched for some debugging options for openoffice, ran it with strace logging
to a file called ooolog

vinay@debdungeon:~#OOO_DEBUG="strace -o ooolog" oowriter

Then, i waited for around 20 seconds and pressed the Ctrl+C to stop oowriter.

vinay@debdungeon:~# tail -n 200 ooolog| less

I found that towards the end it was filled with these messages

stat64("/tmp/OSL_PIPE_0_SingleOfficeIPC_4acd679a70dd792afe65dde68cb44c2", 0xbfffc63c) 
= -1 ENOENT (No such file or directory)

Immediately i tried to find the file in /tmp

vinay@debdungeon:~# ls -la /tmp/OSL_PIP*
srwxrwxr-x  1 vinay vinay 0 Oct 11 12:22 /tmp/OSL_PIPE_1000_SingleOfficeIPC_4acd679a70dd792afe65d

After some closer inspection found that the 2 filenames differed in a peculiar way. The file in the
strace log was 10 characters more than the actual file in /tmp. If you notice the characters
de68cb44c2 are missing in file created in /tmp

Now, things got interesting. I repeated the above process again to check the strace logs and the file
in /tmp. Amazingly, even though the filenames differed, the difference was exactly 10 characters. Why
would 10 characters be cut off from the resulting file? And that too this happens only with oowriter
and no other application that i ran!!

The first thought that occured to me was that probably the filename was too big. So to confirm i
created a file of the name found in strace

vinay@debdungeon:~# touch /tmp/OSL_PIPE_0_SingleOfficeIPC_4acd679a70dd792afe65dde68cb44c2
vinay@debdungeon:~# ls -la /tmp/OSL_PIPE_0_SingleOfficeIPC_4acd679a70dd792afe65dde68cb44c2
-rw-r--r--  1 root root 0 Oct 11 13:56 /tmp/OSL_PIPE_0_SingleOfficeIPC_4acd679a70dd792afe65dde68cb44c2

The file was created without any problem. No characters were cut. Why could oowriter not create the
file? After some further inspection, i noticed that the file created by oowriter was a socket.

The "s" in the ls -al output showed that the file was a socket. The next thought that occured to me was
that probably this was a problem with creation of sockets, but i had gdm installed which also created a
socket in /tmp, and i had logged into the machine using gdm. So i knew the problem was something
specific to oowriter rather than all applications creating sockets.

To just make sure that the openoffice version i had was working fine, i ran openoffice on the server
without any problems. So now i had a combination of nfs mounted rootfs, openoffice, socket and a big
filename. To ensure that nfs has to be a involved i decided to put /tmp on a ramdisk and not on nfs
mount. I added the following to my initialization scripts

mkfs.ext2 /dev/ram0 1024
mount /dev/ram0 /tmp

The client booted and i started oowriter and voila, it started without any problems. Now, i was sure
that it had to do something with nfs, socket and big filename. Some further digging and finally i
figured out the problem

Unix(AF_UNIX) or local(AF_LOCAL) sockets are created using a struct sockaddr_un defined in sys/un.h
which had the following definition

struct sockaddr_un
  {
    __SOCKADDR_COMMON (sun_);
    char sun_path[108];         /* Path name.  */
  };

Now, on a nfs mounted rootfs what seems like /tmp to the client is actually a directory some where on
the server. I / dir on the client pointed to "/mnt/disk1/work/nfs-client/rootfs/default/192.168.0.75"
on the server. Now if i append the filename found in strace logs i get

echo "/mnt/disk1/work/nfs-client/rootfs/default/192.168.0.75/"\                                                                       
> "tmp/OSL_PIPE_0_SingleOfficeIPC_4acd679a70dd792afe65dde68cb44c2" | wc -c
118

118 characters and the sun_path buffer in sockaddr_un is 108 characters long.
118 - 108 = 10 which was the reason why 10 characters were always being skipped!!!

Finally i mounted / pointing to "/nfs-client/rootfs" and openoffice started normally.