< July 2006 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     
Thu, 27 Jul 2006:

How many times have you run into a CD which would let you play a file off the disk, but wouldn't let you copy it out ? It is one of those irritating problems, which should really have a simple solution. Most AVI files are fairly error tolerant, they wouldn't really care if a few KB are missing from the video stream, as long as they can locate the next key-frame, it will keep playing forward. So playing an AVI off scratched media will often work without much disturbance, but it would be near impossible for a normal copy program to create a copy of the file.

Even though you can accomplish the same task with dd noerror, it tries to recover every bit of data possible which is not something that you require when you are trying to copy a large number of files off a DVD. So I basically hacked up a small program which would leave zero-padded 4k holes wherever the disk area wasn't readable. The program accomplishes this by the easiest way possible - by running the following in a loop.

if(i < len) 
{ 
    if(fork()) 
    {
        i += BLOCK_SIZE;
    } 
    else 
    {
        copyblock(dest, src, i, BLOCK_SIZE, len);
        exit(0);
    }
}

Now, that'd obviously result in a very near fork-bomb of your machine while copying a 700Mb file. So I added code so that a fixed number of processes are spawned to start off while the next process is started off only after one of the processes die off. Also, to improve the copying efficency I increased the BLOCK_SIZE to 64k but still wanted to make sure that the size of the errored blocks weren't more than 4k. So I added a retry section, which would spawn 16 processes each handling a 4k block of the errored 64k.

deadproc = wait(&status);

if(WIFSIGNALED(status) && WTERMSIG(status) == SIGBUS)
{
    /* retry */
}

Now the awesome part of this program is not how it works. The really cool thing is to attach strace to this and watch the fork() in operation as well as the SIGCHILD return back. Also it is a very simple example of how something like APC shares memory between all the apache children with mmap (though without any locks). There might just be a better approach for this particular problem, using signal(SIGBUS), but that is left as an exercise to the reader.

Here's the code for the curious. And remember, if you are still running on x86 32 bit, memory mapping big files might cause your OS to run out of address space.

--
(1) Never draw what you can copy.
(2) Never copy what you can trace.
(3) Never trace what you can cut out and paste down.

posted at: 11:44 | path: /hacks | permalink | Tags: ,