There's a certain cultural bankruptcy which shows itself in sequels. It indicates, that you're reduced to imitating yourself. But this isn't that kind of a sequel. No, not the kind where there are T Rexes in the city, trying to make a living drawing cartoons or Arnie switching from ammo boxes to ballots. This is the kind which gives a New Hope.
Yesterday, I had an outpouring of hate against the linux capability model. But the problem turned out to be that setuid resets all the capabilites. In hindsight that makes a lot of sense, but didn't even strike until the kernel people (y! has those too) got involved (and I didn't RTFM).
Enter Prctl: The solution was to use the prctl() call with PR_SET_KEEPCAPS to ensure that the capabilities are not discarded when the effective user-id of a process is changed. But, even then, only the CAP_PERMITTED flags are retained and the CAP_EFFECTIVE are masked to zeros.
So, with the prctl call and another cap_set_proc to reset CAP_EFFECTIVE, it was on a roll. Here's the patch on top of unnice.c.
#include <sys/resource.h> +#include <sys/prctl.h>; @@ -26,12 +27,14 @@ if(!fork()) { + prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0); /* child */ if(setuid(nobody_uid) < 0) { perror("setuid"); } + cap_set_proc(lcap); if(setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0) - 1) < 0)
Thus concludes this adventure and hope that this blog entry serves as warning of things to come. Watch this space for more Tales! Of! INTEREST!.
--Only great masters of style can succeed in being obtuse.
Running infinte loops is a tricky challenge. What happens to a process when a programmer writes an infinite loop, should be familiar to all. But the challenge is to not let that affect the *other* processes. There seemed to be a perfect solution to it - setrlimit().
The function lets you set soft and hard limits on CPU, so that if a process does exceed the soft limit CPU usage, a SIGXCPU is raised. The process can catch the signal and do something sensible. Basically, all that was required was for the process to call setpriority and let the linux process scheduler slow it down to a trickle.
But a process can lower its priority, but not raise it - if it is a non-privileged process. But linux capabilities allows you to grant CAP_SYS_NICE to the process which essentially lets a non-privileged process muck around with priority - down and up.
To begin with /proc/sys/kernel/cap-bound is unbelievably confusing to use. It is a 32 wide bit-mask on which the 23rd bit apparently seems to be the CAP_SYS_NICE value. After much mucking around, I came to the conclusion that "-257" would be 0xFFFFFEFF which only disables CAP_SETPCAP. But even then the setpriority call kept failing. Here's my test code.
cap_t lcap; const unsigned cap_size = 1; cap_value_t cap_list[] = {CAP_SYS_NICE}; lcap=cap_get_proc(); cap_set_flag(lcap, CAP_EFFECTIVE, cap_size, cap_list, CAP_SET); cap_set_flag(lcap, CAP_PERMITTED, cap_size, cap_list, CAP_SET); cap_set_proc(lcap); if(setuid(nobody_uid) < 0) perror("setuid"); if(setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0) - 1) < 0) perror("setpriority");
Here's a link to the test case in a more compileable condition. Build it with gcc -lcap and run with sudo to test it. Right now, my ubuntu (2.6.22) errors out with this message.
bash$ gcc -lcap -o unnice unnice.c bash$ sudo ./unnice 0: =ep cap_setpcap-ep setpriority: Permission denied
The core issue has to do with apache child-process lifetimes. The only recourse for me is to kill the errant process after the bad infinite loop and have the parent process spawn a new process with a normal priority. But which means blowing off nearly all the local process cache, causing memory churn and more than that, the annoyance of a documented feature not working.
This story currently has no ending, but if any kernel hackers are reading this and should happen to know an answer, please email gopalv shift+2 php noshift+> net. And thus we prepare for a sequel (hopefully).
--I use technology in order to hate it more properly.
-- Nam June Paik
Eventually having connected my amd64 desktop upto a decent internet connection, I decided to wipe the FC3 install on the box and replace it with ubuntu. I left my i686 gentoo install alive and started out on an Ubuntu 6.06 LTS (aka Dapper Drake) install, sometime around 11 PM.
My hardware setup is not what I would call conventional. Inside my monster tower I have four SATA ports - two of them on the motherboard and a couple more on my RAID controller. The first port is occupied by my 10k RPM 80gig drive which is basically my boot drive. The other onboard port has a 120 gig drive, which is my secure (AES256) storage and is hardly ever mounted. There are two 200 gig SATA drives connected to the RAID controller (Promise), which after a couple of abortive attempts with dmaraid, have ended up as discrete SATA disks instead of in RAID-0.
Now, as soon as the Dapper liveCD boots, the disks on the RAID subsystem show up as sda and sdb. The onboard SATA controller gets relegated to sdc and sdd. I thought it was strange, but I picked the renamed partitions and went through a complete install with this setup. The first reboot just froze the machine with a pretty GRUB on the left corner. Modifications of device.map, replacing the hd1 entries in the boot menu did nothing either. Whatever I tried, the machine just threw up a blank GRUB stage1.
As it turns out the BIOS and grub still recognize the onboard SATA channels as the first pair of hard disks, but half-way through stage2 of the boot the kernel takes a flip and then it all goes south for the summer. Faced with the renaming, I tried to connect the boot drive to the RAID controller, just to bring it back to sda. But as it turns out my BIOS can only boot off onboard primaries. Which all left me up shit creek without a paddle.
Now, dapper is not without its advantages. The live CD installer meant that I could do more than just watch the lines scroll while the installer did its magic (anyone remember the caldera tetris games ?). So I still had a functional internet browser even though my machine wasn't booting (grub ensured that I couldn't boot gentoo).
Thanks to the internet, I came to know that I wasn't alone (*play x-files music*). Turns out somebody had run into this Grub of Death before. Also there lay open a launchpad bug in the helpful status called CONFIRMED, which basically meant I had run into something real (like an iceberg). As luck would have it, someone had already sent a patch to work-around the problem in kernel land - and it had been rejected. All this was no help at all, till I discovered magic beyond mere device names.
UUID mounts: /etc/fstab could use other identifiers than raw device names to uniquely identify drives. And the first patch I discovered dates from 2001 - this must've been the *best kept secret* in linux kernel land. As it turns out, both grub and /etc/fstab will accept volume UUIDs, completely transparently. But still the installer couldn't figure out where to put grub or what to put in it. At around 3 AM, I managed to pry open my machine, pull out the off-board SATA and attempt a reinstall.
The system reinstall went almost perfectly. All that was left to do was to replace the device names in the relevant sections with the correct UUID volume names. The unique identifiers can be read easily using the blkid command. A couple of reboots later, I plugged the RAID controller back in and added those partitions similarly.
It all worked out, eventually. But the amount of effort it took to get it working reminded me of those nights in 1999, spent struggling with an X configuration, with some magazine's help section open.
--"First things first -- but not necessarily in that order."
-- Doctor Who