Archive for the ‘ Linux ’ Category

Certification exams and the like…

Ok, so just this past week I think I’ve gone a bit crazy with the certification exam bookings. I booked myself in for two linux certifications which I was due to take yesterday. I say “due” because when I actually got to the place where the exam was meant to be “Professional Computer College, 85-89 Duke Street, Liverpool, L1 5AP“. The building looked just about abandoned, except through a ground-floor window I could just about see a noticeboard with some clippings about “Nerve”. A wee google later and it appears that they now occupy the building. The left brass placard next to the door had been removed so I only had the street number to go against. I rang the doorbell and what looked like a bodyguard (he had an earpiece with a spring cord) answered the door. I asked him if this building was Pearson Vue, or the computer college, he said no and looked at me rather suspiciously. He didn’t offer any information, or ask if I needed directions. He just closed the door rather quickly. As I was walked away, looking up a number on my phone, I was aware that a silver Range Rover with blacked out windows had pulled up rapidly outside the door I’d just been at. About three people, including the bodyguard guy I’d just spoken to quickly walked out of the building and got into the back of the 4×4 and sped off. That was… interesting.

Anyway, back to the story. I tried contacting Pearson Vue (or “Piss-On-You” as my sister honestly mis-heard on the phone) on three numbers, first off I tried the two numbers they gave for the test-centre both of which just rang out. Next I tried the customer services number which was auto-answered by a recorded message saying that “Due to an incident no-one can answer the phone at the moment. Please try again later”. To say I was thoroughly pissed off was an understatement. I walked up and down Duke Street twice more looking for a building that had anything like “Pearson Vue” of “Professional Computer College” as a placard. No dice. By this time I’d spent about an hour looking for the place. I eventually gave up after all possible routes of contacting Pearson Vue failed. I’m going to have to ring them on Monday and demand a refund.  Watch this space.

So, the exams themselves… these ones are fairly basic but I need to get them out of the way before I can move up to the more difficult (and higher regarded) levels. In case you’re interested, the ones I’m taking are with the LPI and they’re the Level 1 exams. Meanwhile in Windows land, I’m booked in for exam 70-290 at the end of July. I’ve been working with Server 2003 for, well, 6 years now, I’ve been on the course, I’ve read the book, I’m pretty confident I’ll do ok, I just need to keep in mind the “Microsoft way”.

How to force mountd to use a static port on Red Hat

So I’ve been working with a very strict firewall on an AIX host which is mouting an NFS share on Red Hat 5.3 hosts and since NFSD on Red Hat utilises the RPC protocol (port 111) and NFS (port 2049) which are static, it unfortunately also uses rpc.mountd (aka mountd) which (by default) doesn’t run using a static port, instead, every time it starts up, it asks the RPC portmap service for a free port number, and uses that.

I just couldn’t have this happening on Red Hat, since the AIX firewall is locked down as tight as can be, with even anomalous outbound tcp/ack’s being disallowed. I know that the portmap service gets its free port numbers from (among other sources) /etc/services so I decided to grab the current port number that mountd was running on…

rpcinfo -p | grep mountd

and make an entry into /etc/services in the hope that rpc.mountd would see the mountd entry and automatically use that port number, and only that port number, such an example entry:

mountd          672/tcp                         # Rob's Edit - binds mountd to a static port
mountd          672/udp                         # Rob's Edit - binds mountd to a static port

I restarted portmap and nfs, and ran rpcinfo again…

service portmap restart
service nfs restart
rpcinfo -p | grep mountd

… and lo-and-behold rpc.mountd had binded to the static port specified.

Note to self… co-incidences in IT DO happen!

Ok, so on Friday I was working from home while recuperating after some surgery (don’t ask). I’m currently working on a large migration project which is really high priority time-scale wise, which is why I was working from home, since I, nor the company I work for can really afford for me to be away from this project for any length of time. So I’m working on a large IBM RS6000 AIX wide node where I need to create an NFS share to their new Red Hat based platform, this required a minor change to the genfilt / mkfilt rules on AIX to allow the new systems to access the NFS shares. I made the one line change and reloaded the firewall on the system, unfortunately this made NIS/YP fault and stop responding, not such a big deal, except that this node is also a NIS server, which meant that users who were authenticating from a frontend running on a thin node were unable to, which started to cause issues quickly, fortunately existing users weren’t affected, however newly connecting users weren’t getting on.

ibm_rs6000 As soon as I’d reloaded the firewall I could see that NIS had failed (inexplicably) and backed out the change, I had to get NIS back online, and reloading the YP services wasn’t working. With the change backed out, I reloaded the firewall again, this time mkfilt just wasn’t having it. The syntax was fine, but the firewall was now blocking access to all services. Remember, I’m working from home, via an SSH session to a host at work with rlogin access to the wide. As soon as the firewall started blocking traffic my remote session died and I was unable to access it. FUCK!

I get on the phone straight away to the DC and asked a colleague of mine, Brian, to re-run the firewall script from the control workstation, which has a direct, non-IP connection to the wide. About 10mins later I get a call saying he’s been able to restart the firewall ok, and I can access the server from my connection again. Phew. NIS is still down though and still refuses to start-up cleanly. A reboot is in order. By this time, I’m pretty much ready to head into the DC so I can be hands-on with the kit when needed. Brian gets in touch with the client and co-ordinates a graceful shutdown of the databases before we initiate a standard reboot.

By the time I arrive at the DC (15mins away by car fortunately), we’ve managed to arrange “unscheduled maintenance” time, and we bounce both the nodes. Everything comes back up perfectly, and users can log back in just fine. We notify the client, and they can see everything’s ok, the databases have come up and everything’s back the way it was.

I get into finding out what caused NIS and the second firewall reload to spanner completely when we get another call from the company saying that LPR print queue jobs are not being passed from the thin node to a 3rd server which is running Caldera Open Linux linux (yeah, I know!). This Caldera box is running Tarantella which provides client-based printing. Essentially, users printed from a terminal on the thin-node, which is mapped to a remote print queue on the Caldera server, and the Tarantella server then maps the user’s printer to their print queue on the Caldera server. Essentially allowing (in a very round-about way) client-based printing from a terminal. For turn of the century stuff this was quite advanced, since there was no way to do this dynamically, from a web-based (HTTPS) client, and without setting up static routed print-queues on the node.

_643711_caldera_linux23_150 So that’s the background. Now, when we heard about this printing issue, which had been an intermittent problem since the platform had been introduced, but this had normally been resolved by a simple reboot of the Caldera server. We decided that since the nodes had been down, this had likely caused a bottleneck between the servers and that Caldera needed a reboot in order to enable the bottleneck to clear and allow the print queues to start moving again. We bounce the box and the queues still are being held on the thin node. FRAK! I know beyond a doubt that the issue isn’t software firewall related, since my minor change (a) wouldn’t have affected port 515 communications and (b) the firewall is running ok. My boss, John, had become involved around the time we rebooted both the nodes, as he was interested to know what was going on. After being brought up to speed he was convinced that this was a firewall related issue, since the initial cause was firewall related, and that I’d asked our network manager to add new rules to allow NFS between the new and old platforms. I knew it was highly unlikely that the problem was a firewall one since the changes had been backed out, and the system was in it’s normal, default configuration but  John felt that the timing was just too close for it to be a coincidence with anything other than a firewall issue. It took us a while, looking at the firewall rules in place, to see if any hits were being matched on the Cisco’s (which they weren’t), telnetting to ports etc all of which were fruitless. It was obvious in my mind that there was something on the Caldera box which was not allowing the LPD daemon to respond properly. After looking through the tarantella logs I checked the /var/log/messages log and saw that the LPD daemon faulted at start-up with the error “not enough disk space”. That old chestnut.

After a little more digging, it turned out that the / partition, having only 2GB of space had slowly been filled up by apache access and error log files since the early 2000’s and had caused the disk to become full. Monitoring hadn’t been set up to check disk space usage, which beggared my belief, but there it is. The apache logs had filled the last of the available disk space at pretty much the exact same time as the AIX system had gone down. All of the time spent wasted checking firewall rules and all the printing problem was related to was a frakking simple thing – disk space.

So the moral of this story is, blind co-incidence DOES happen in this profession, and it’s something that I’ll definitely remember for the rest of my career!