< June 2006 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
252627282930 
Fri, 16 Jun 2006:

So I'm travelling to Ladakh for a two week trip. Ladakh literally means land of high passes and is sandwitched between the Karokaram and Himalayan ranges. The area is respite with lakes, steep passes and rugged beauty. It is almost akin to visiting Tibet, without actually leaving Indian territory. The J&K state is not known for its political stability and I don't expect a smooth passage through Ladakh, though I do plan to arrive back home in one peice and alive.

Leh is probably going to be the base camp for all of our expeditions outwards. The current route traces itself through Rohtang La (13,000+ ft) and upto the passes at 17,500 feet. The passes are open only between July and September, but we are hoping that the mid-summer Sun has cleared up enough snow to take this route rather than the Srinagar-Kargil (NH1) road which was the scene for the 1999 exchange of fire with Pakistan. Anyway, first stop on the route is the hill station of Manali which is a pleasant enough place and lies on the ancient Silk Route towards the middle east.

This trip is so cool that I had to buy extra sweaters for it. Jokes aside, I have never seen snow in my life. I have always lived in the tropical climes and am accustomed to 20-35 Celsius days. So I expect the mountains to be a challenge to get used to, in addition to the normal altitude sickness. All in all, I think it will be one of those experiences which you can think back to and draw strength from, not just a fun fortnight out on the hills. I can already hear myself say "I can do this".

Wish me luck.

--
What did Mohammed do when the mountain didn't come to him ?

posted at: 15:53 | path: /travels | permalink | Tags: ,

Thu, 15 Jun 2006:

For almost two years, I've been trying to explore and understand the social situations which have slowly built up the strict roles assigned to the sexes. From the husband trying to bring home the bacon to the housewife packing lunches for their children, it has somehow always struck me as a social conformity rather than a natural state of affairs. But with every bit of new knowledge I gather, I have come to understand that this is merely a snapshot in the life of mankind (womankind too). With all that in context, I bought yet another book on genetics - Adam's Curse.

'To be born woman is to know
Although they do not talk of it at school
That we must labour to be beautiful.'
I said, "It's certain there is no fine thing
Since Adam's fall but needs much labouring.

                     -- Adam's Curse, Yeats

We are all born women. Until the sixth week, a foetus is developing identically for both sexes. But why do we men need to exist in the first place ? As it turns out, there are other genes that exist in our cells than the nuclear DNA that we commonly exchange and intermix during reproduction. The reason men exist is because when two normal haploid cells combine, the two cells' mitochondrial DNA end up fighting to their death. Somewhere in the dark distant past, the cells discovered that they could fix this match by depriving one cell of most of the cytoplasm. Those crippled cells, with a few mitochondria churning out O2 free radicals and literally burning themselves up ended up as representative of the male. And they basically fought the system by making up for that in numbers.

But that can't be the only reason to have all the rampaging testosterone fuelled men ? Well, it turns out that the mutation rate is much higher than in the other gamete and it all explodes into randomness. The Y-Chromosome is the only record of this chaos, since it does not recombine with any other and reduce the mutation damage done to it. It is on a slow but sure path to ending up as completely damaged junk DNA.

With the current population and extrapolating the mutation rate over the next 5000 years, the Y-chromosome would have to be exceptionally lucky to survive that many cell divisions. Unless our non-Junk segments of the gene pattern managed to cross over onto some other chromosome,the Adonis Chromosome according to Bryan Skyes, the future of men is a dead end.

We're merely a passing fad.

--
<he> Was it good for you ?
<she> Read my blog.

posted at: 16:47 | path: /books | permalink | Tags: ,

When the month of may dawned, I had promised myself I'd walk 221 kms before the month was through. But 'twas not to be, when I got called home for more than a week and all that energy got channeled out for a different use. But in the last thirty days I've been in Bangalore, I've walked 261 KM. Approximately that is equivalent to walking in to work every day. Over the period, I have gained 4 1/2 kilograms, reintroduced breakfast into my diet and cut down my coffee consumption to just six cups a day.

Take a walk. It will change your life.

--
I love walking in the rain, 'cause then no-one knows I'm crying.

posted at: 15:57 | path: /me | permalink | Tags: ,

Tue, 13 Jun 2006:

As authors both George Orwell and Aldous Huxley were masters at the task. But as visionaries (yes, for the last time 1984 is a warning, not a guidebook), they differed in a very fundamental way. Orwell has always rooted for an external oppressor who shall conquer us and rule our thoughts, lives and the world in total - the infamous Big Brother. On the other hand, Huxley had portrayed an even more outlandish concept, where the people accept and in fact, love the thing that incapacitates them from rational thought. In the Brave New World, there is no necessity for an oppressor to deprive us of our autonomy, individuality or maturity, we would gladly give that up for the security and convenience the oppression offers.

Orwell dreamed of a future where information would be denied, kept hidden from the masses and handed out in small enough parcels. A totalitarian regime where information is the currency and control was achieved by denying it. Huxley feared the opposite, where the important information is drowned in a mass of irrelevance. Where nobody picks up a book because there are far more convenient distractions to choose from.

1984 controlled people by pain, hurt when you try to enter the forbidden corridors of knowledge, while the Brave New World enslaved you with pleasure. Given you so much that you have no desire for anything more, perfectly content to watch the feelies and drink soma. Reduced to passivity and egotism, ever fearful of any disruption which would destroy the comforts that were traded in for free speech and thought, yet oblivious to their own slavery.

Yet, when the year 1984 came, there were those who rejoiced that the world hadn't fallen to a Big Brother. But the Brave New World couldn't be denied, the quest for a happy living strays too close to the ultimate paradise of ignorant bliss. As you watch an everyman sit in front a TV, sipping whatever gets him high and wondering about what exactly is happening with Paris Hilton's latest boyfriend, you do have to wonder is this a Brave New World ? The critique on the free press and its role in oppression is contrary to common belief, but some corner of my mind it is happening today (ok, pickup a Times Of India).

Huxley's message is chilling in its content and cynical in its perception. The fact that people will sacrifice essential liberties of free speech and thought to enjoy a comfortable life sits in opposite to the loss of paradise that the Adam & Eve suffered. In our deepest psyche this is a holy grail we yearn for, even at the cost of our individuality, history or autonomy.

1984 ends the same way the Brave New World began, love instead of hate.

He gazed up at the enormous face. Forty years it had taken him to learn what kind of smile was
hidden beneath the dark moustache. O cruel, needless misunderstanding! O stubborn, self-willed 
exile from the loving breast! Two gin-scented tears trickled down the sides of his nose. But 
it was all right, everything was all right, the struggle was finished. He had won the victory 
over himself. 

He loved Big Brother.

The golden cage that we built for ourselves, the only way these two differ is how we got into the cage. So who is right ? I'd say Huxley, but 1984 is yet to pass.

--
If society fits you comfortably enough, you call it freedom.
               -- Robert Frost

posted at: 18:12 | path: /philosophy | permalink | Tags: , ,

Been pissed all day due to the stupid advertisements on Yahoo groups emails. Every mail I read has a very irritating sidebar which I find no use for. So I added the following 3 lines to my ~/.thunderbird/*default/chrome/userContent.css .

#ygrp-sponsor, #ygrp-ft, #ygrp-actbar, #ygrp-vitnav {
	display: none;
}

Much better. Ad block is one of those things where being the minority is sometimes an advantage.

--
If we don't watch the advertisments it's like we're stealing TV.
                 -- Homer Simpson

posted at: 15:31 | path: /hacks | permalink | Tags: , , ,

Mon, 12 Jun 2006:

Since I do not run any server side code, I'm always playing around with new client side tricks. For example, the XSL in my RSS or the sudoku solver. But recently I was playing with the HTML Canvas and was wondering whether there was some other way I could generate images client side with javascript. And it turns out that you can.

Tool of choice for moving image data from a javascript variable into a real image is a data: URI. And it also helps that windows bitmaps are so easy to generate, after all it was designed with simplicity in mind. Here's a chunk of code which should in most ways be self explanatory.

function Bitmap(width, height, background) 
{
  this.height = height;
  this.width = width;
  this.frame = new Array(height * width);
}

Bitmap.prototype.setPixel = function setPixel(x,y, c) {
  var offset = (y * this.width) + x;
  /* remember that they are integers and not bytes :) */
  this.frame[offset] = c;
};

Bitmap.prototype.render = function render() {
  var s = new StringStream();
  s.writeString("BM");
  s.writeInt32(14 + 40 + (this.height * this.width * 3)); /* 24bpp */
  s.writeInt32(0);
  s.writeInt32(14+40);
  /* 14 bytes done, now writing the 40 byte BITMAPINFOHEADER */
  s.writeInt32(40); /* biSize == sizeof(BITMAPINFOHEADER) */
  s.writeInt32(this.width);
  s.writeInt32(this.height);
  s.writeUInt16(1); /* biPlanes */
  s.writeUInt16(24); /* bitcount 24 bpp RGB */
  s.writeInt32(0); /* biCompression */
  s.writeInt32(this.width * this.height * 3); /* size */
  s.writeInt32(3780); /* biXPelsPerMeter */
  s.writeInt32(3780); /* biYPelsPerMeter */
  s.writeInt32(0); /* biClrUsed */
  s.writeInt32(0); /* biClrImportant */
  /* 54 bytes done and we can start writing the data */
  for(var y = this.height - 1; y >=0 ; y--)
  {
    for(var x = 0; x < this.width; x++)
    {
      var offset = (y * this.width) + x;
      s.writePixel(this.frame[offset] ? this.frame[offset] : 0);
    }
  }
  return s;
};

Well, that was easy. Now all you have to do is generate a base64 stream from the string and put in a data: URL. All in all it took a few hours of coding to get Javascript to churn out proper Endian binary data for int32 and uint16s. And then it takes a huge chunk of memory while running because I concatenate a large number of strings. Ideally StringStream should have just kept an array of strings and finally concatenated them all into one string to avoid the few hundred allocs the code currently does. But why optimize something when you could sleep instead.

Anyway, if you want a closer look at the complete code, here's a pretty decent demo.

--
Curiousity is pointless.

posted at: 14:44 | path: /hacks | permalink | Tags: , ,

Fri, 09 Jun 2006:

After much trials and tribulations, Wikipedia is finally using APC. They've been playing around with Turk MMcache and other accelerators for a while. But currently APC is the only one with the ball as far as caching is concerned. Recently, somebody did a benchmark on the common accelerators used in php land - read it here. But at that point APC just wins hands down, though my commit last night probably must've pushed APC below eAccelerator, it is required to run properly on a multi-CPU apache on high loads.

hw.php deserialize.php include-pma.php
eAccelerator 1093 160 86
apc 1100 163 83
PHP alone 886 157 28

Now, the next heavy user of PHP around is sourceforge.net who is apparently still using eAccelerator. Apc still has a few chinks in its armour, but it is still *my* work. And much more importantly it, for once, doesn't appear doomed :)

--
Real programs don't eat cache.

posted at: 22:14 | path: /php | permalink | Tags: , ,

Thu, 08 Jun 2006:

I recently discovered an easy way to inspect php files. So xdebug has a cvs module in their cvs called vle. This prints out the bytecode generated for php code. This lets me actually look at the bytecode generated for a particular php data or control structure. This extension is shining a bright light into an otherwise dark world of the ZendEngine2.

Let me pick on my favourite example of mis-optimisation that people use in php land - the HereDoc. People use heredoc very heavily and over the more mundane ways of putting strings in a file, like the double quoted world of the common man. Some php programmers even take special pride in the fact that they use heredocs rather than use quoted strings. Most C programmers use it to represent multi-line strings, not realizing php quoted strings can span lines.

<?php

echo <<<EOF
	Hello World
EOF;

?>

Generates some real ugly, underoptimised and really bad bytecode. Don't believe me, just look at what the vle dump looks like.

line     #  op            ext  operands
-------------------------------------------------
   3     0  INIT_STRING        ~0
         1  ADD_STRING         ~0, ~0, '%09'
         2  ADD_STRING         ~0, ~0, 'Hello'
         3  ADD_STRING         ~0, ~0, '+'
         4  ADD_STRING         ~0, ~0, 'World'
   4     5  ADD_STRING         ~0, ~0, ''
         6  ECHO                   ~0

That's right, every single word is seperately appended to a new string and after all the appends with their corresponding reallocs, the string is echoed and thrown away. A really wasteful operation, right ? Well, it is unless you run it through APC's peephole add_string optimizer.

Or the other misleading item in the arsenal, constant arrays. I see hundreds of people use php arrays in include files to speed up the code, which does indeed work. But a closer look at the array code shows a few chinks which can actually be fixed in APC land.

<?php

$a = array("x" => "z", 
		"a" => "b",
		"b" => "c",
		"c"	=> "d");
?>

Generating the following highly obvious result. Though it must be said that these are hardly different from what most other VMs store in bytecode, they are limited by the fact that they have to actually write the code (minus pointers) to a file. But Zend is completely in memory and could've had a memory organization for these arrays (which would've segv'd apc months before I ran into the default array args issue).

line     #  op                      ext  operands
-----------------------------------------------------------
   2     0  INIT_ARRAY                   ~0, 'z', 'x'
   3     1  ADD_ARRAY_ELEMENT            ~0, 'b', 'a'
   4     2  ADD_ARRAY_ELEMENT            ~0, 'c', 'b'
   5     3  ADD_ARRAY_ELEMENT            ~0, 'd', 'c'
         4  ASSIGN                           !0, ~0
   7     5  RETURN                           1
         6  ZEND_HANDLE_EXCEPTION            

This still isn't optimized by APC and I think I'll do it sometime soon. After all, I just need to virtually execute the array additions and cache the resulting hash as the operand of the assign instead of going through this stupidity everytime it is executed.

Like rhysw said, "Make it work, then make it work better".

--
Organizations can grow faster than their brains can manage them.
                    -- The Brontosaurus Principle

posted at: 16:22 | path: /php | permalink | Tags: , ,

Wed, 07 Jun 2006:

The question of which came first, the chicken or the egg, exists because an egg is clearly not a chicken.
#15486106

Simple, indubitable and unequivocal.

--
A chicken is an egg's way of producing more eggs.

posted at: 18:37 | path: /philosophy | permalink | Tags: , ,

Twenty years we've been together, fought our fights from day one,
bit, kicked and scratched each other till our parents did us part,
known our light sides, suffered our dark sides,
laughed, cried and dried many a tear together.
If not, but what else are brothers for ?

Happy Birthday to my favourite lawyer.

--
Although it's never fun getting one year older, it sure beats the alternative!

posted at: 15:58 | path: /me | permalink | Tags: ,

As a huge fan of the Simpsons, when I saw the book Planet Simpson lying around in the Science section (whaa ?) of Landmark, I couldn't resist picking it up and having a browse through. Ended up with me going home with the book paid for and a resolution to read it through properly (for a 500 page book).

After reading the entire breadth and length of the book, I'm quite unimpressed. The book is really about Planet Earth and describing exactly where Simpsons interescts into this very down to earth place. The hypocritical clergy, the corrupt politicians, the town drunkard, the evil corporations, all picked out of the background and given their day in the sun. Even characters like Troy McLure (played by the late Phil Hartman) or Apu Nahasapemapettilon get their just analysis and are compared to the realworld celebrities and immigrants.

As thorough as the book is, it wastes quite a bit of time putting the episodes in their time & place, explaining the emergence of punk rock and grunge and how the social commentary carried into Simpsons came into being. The world of Simpsons suddenly isn't too different from ours, only carried a bit further into yellow skinned incredulity. After all it wasn't the planet of the apes and neither is it planet Simpson, it's merely home.

Now, I don't think I'd ever be satisifed by anything other than a hardbound snpp.com - but this book's pretty good at pointing out the show's golden age and when exactly it jumped the shark.

I'm happy that I read it, but I don't think I'll read it too often (unlike my Wodehouse, Douglas Adams or Pratchett collections). So I'd have rather borrowed this book than bought this eulogy to the simpsons. Next up for perusal are Nancy Cartwright's My Life As A 10-Year-Old Boy (aka Bart) and the Matt Groening guidebook to the Simpsons - One Step Beyond Forever. Maybe soon, maybe never.

--
"Me fail English? That's unpossible."
          -- Ralph Wiggum

posted at: 15:01 | path: /books | permalink |

Mon, 05 Jun 2006:

Most of the last weekend, I spent reading books in my bed. My latest acquisition is a hardbound by Jasper Fforde. Now, I first encountered this author in Rhys's posession, in a book called the The Eyre Affair and was basically recommended with 'It's absolutely crazy'. And it turned out to be exactly that, surreal to the extreme though you are left with a lingering doubt whether all this could really happen. There is just enough reality mixed in to make you wonder, just like the time you saw The Truman Show. The current tome under inspection is titled The Big Over Easy which does nothing to diminish my opinion.

The book tries to transcend reality by introducing nursery rhyme characters including the anthropomorphic animals and the usual fare of animated pastries, girls with 28 feet of hair and Solomon Grundy as an old man. But the beginning of the book starts with a trial of Three Little Pigs for pre-meditated murder of one Mr Wolff. The case hinged around the fact that the pot of water into which the 'Big Bad Wolf' fell in would have taken six hours to reach boiling which indicated premeditation. Since the pigs were tried by a jury of peers, that is a baker's dozen of pigs, they walked scot free. Then there are the detectives whose guild treats public approval as its currency and traces its history back to Sherlock Holmes (as if he's real). Each guild detective has an assigned Official Sidekick whose duties include writing out a passable entry for the mystery hungry magazines. So there's the grungy and bitter man stuck in a rut - Jack Spratt and the ambitious career detective Friedland Chymes. Both of whom started from the same humble beginning of the Nursery Crimes Division, but while Friedland was ranked #2 in popularity, Spratt wasn't even on the list.

And then the case comes up even more nuttier than a christmas fruit cake. Humpty Dumpty had a great fall, but was he pushed ? Well, as it turns out his ex-wife shot someone else thinking it was Humpty, his current lover put poison in his coffee, his previous flame's (who happened to be Rapunzel) husband ordered him killed and then there was the part about him hatching. Anyway, we ended up with a conniving chiropodist, a golden goose, a geek who's obsessed with spelling (re: Unsfzpxkable) , an alien who loves filing and then there's the Jellyman. Not to mention that Jack has killed 4 giants before and cuts down a beanstalk to kill the monster.

The first thought to enter my head at the end of the book was What was this guy smoking ?.

--
Humpty Dumpty didn't fall, he was pushed.

posted at: 15:17 | path: /books | permalink | Tags: , ,

You can't keep a rain-forest in your backyard, but sometimes a tape of David Attenborough's productions will serve as a sufficient substitute. Not only did he produce groundbreaking nature documentaries, but he assembled an amazing team who were equally passionate about the unrivaled beauty of nature. The Bristol division of BBC turned out a series of expensive nature documentaries which were setting a standard in the field.

My attraction to his particular brand of nature documentaries come from the frequent change of scenery. Where most of the "modern" documentaries reserve an entire half-hour to a particular location or activity, Attenborough documentaries generally travel across the world in the hour. Also the documentaries hardly have any people and does not track people as characters in the story being played out. I am pretty sure they could've made a few more hours of documentaries with the reels they cut out per hour.

Private Life of Plants is my favorite series by David Attenborough. Now, Blue Planet and other works have taken me far, wide and deep throughout the biosphere. But those leave me with a distant yearning to see for myself these wonders of the world - Christmas Island while the crabs migrate, Palau to see the jellyfish swarms, Fjords of Norway where the whales sound, Great Barrier Reef when the corals are spawning. Deep in my heart, I know that I'll probably die before I see these desires fulfilled. Private Life of Plants, on the other hand takes me somewhere which doesn't exist - the world of plants where months seem to pass in minutes and in 30 seconds we've moved from a frigid winter to a warm spring, through the thaws and amidst the flowers that declare the arrival of spring.

That by itself may seem like a magical trip across time. But there's magic even in the simple things. The background music for instance is very appropriate and literally blends into the actions on screen, as if they had been scripted to the music. There are visits to the inaccessible islands in the pacific, which are preceded by the aerial views of them which add to the environmental setup for some exotic flowers. Even something as simple as a himalayan balsam seed falling into the nearby water is dramatic and accentuated by the plink of the drop, pulled into focus in slow motion.

Amidst all these scenery switches, there are no pictures of an elderly (well, in his early forties) English gentleman lugging his luggage, sweating it out in a 4x4 jeep (no Steve Ervin, he is). Even when we rarely see him, it is merely to emphasize the extremity of the environment and how exactly we (or he) just don't fit in.

But as I said earlier, the truly remarkable thing is its fast forwarded view of the plant world, documenting the fights, battles and conquests of these outwardly immobile flora. There is beauty in a flower opening, blackberries ripening and a mushroom slowly poking its head through the dead leaves. There is nothing beyond just sitting and watching wind hit a dandelion patch. Watching and wondering, you couldn't get tired of the magic on screen.

This is the ultimate documentary. *THE* Ultimate.

--
Art is Nature speeded up and God slowed down.
                -- Chazal

posted at: 10:57 | path: /movies | permalink | Tags: , ,

Fri, 02 Jun 2006:

Something happened tonight that made me pick up my employment contract and read it again. As I was reading the whole document, it suddenly hit me that this document specifies restrictions applicable to me, as an employee of Yahoo!, but would not automatically transfer to either the executor of my estate or anybody whom I assign full power of attorney.

So assuming that I had already assigned full power of attorney to my father (or mother) before I had signed this particular contract, would a civil violation by them automatically make me liable ?

Well, I don't know ... but I intend to find out.

--
I'm not a lawyer. I don't even play one on TV.
                  -- Linus Torvalds, gcc lists

posted at: 02:39 | path: /me | permalink | Tags: ,

Thu, 01 Jun 2006:

Valgrind is one of the most common tools people use to debug memory. Recently while I was debugging APC, the primary problem I have is of php Zend code writing into shared memory without acquiring the locks required. I had been debugging that with gdb for a while, but gdb is just dead slow for watching writes to 16 Mb of memory and generating backtraces.

The result of all that pain was a quick patch on valgrind 3.1.1. The patch would log all writes to a memory block with backtraces. But valgrind does not have a terminal to type into midway, unlike gdb. So the question was how to indicate a watchpoint. Valgrind magic functions were the answer. The magic functions can pass a parameter to valgrind while in execution. This is a source hack and is a hell of a lot easier to do than actually breaking in gdb and marking a breakpoint everytime you run it. So here's how the code looks like :-

#include "valgrind/memcheck.h"

int main()
{
	int * k = malloc(sizeof(int));
	int x = VALGRIND_SET_WATCHPOINT(k, sizeof(int));
	modify(k);
	VALGRIND_CLEAR_WATCHPOINT(x);
}

That is marked out in the normal code with the following assembly fragment.

    movl    $1296236555, -56(%ebp)
    movl    8(%ebp), %eax
    movl    %eax, -52(%ebp)
    movl    $4, -48(%ebp)
    movl    $0, -44(%ebp)
    movl    $0, -40(%ebp)
    leal    -56(%ebp), %eax
    movl    $0, %edx
    roll $29, %eax ; roll $3, %eax
    rorl $27, %eax ; rorl $5, %eax
    roll $13, %eax ; roll $19, %eax
    movl    %edx, %eax
    movl    %eax, -12(%ebp)

This doesn't do anything at all on a normal x86 cpu but inside the valgrind executor, it is picked up and delivered to mc_handle_client_request where I handle the case and add the address and size, to the watch points list.

So whenever a helperc_STOREV* function is called, the address passed in is checked against the watchpoints list, which is stored in the corresponding primary map of access bits. All of these bright ideas were completely stolen from Richard Walsh patch for valgrind 2.x. But of course, if it weren't for the giants on whose shoulders I stand ...

bash$ valgrind a.out

==6493== Watchpoint 0 event: write
==6493==    at 0x804845E: modify (in /home/gopalv/hacks/valgrind-tests/a.out)
==6493==    by 0x80484EA: main (in /home/gopalv/hacks/valgrind-tests/a.out)
==6493== This watchpoint has been triggered 1 time
==6493== This watchpoint was set at:
==6493==    at 0x80484DB: main (in /home/gopalv/hacks/valgrind-tests/a.out)

Now, I can actually run a huge ass set of tests on php5 after marking the APC shared memory as watched and see all the writes, filter out all the APC writes and continue to copy out the other written segments into local memory for Zend's pleasure.

Writing software gives you that high of creating something out of nearly nothing. Since I am neither a poet nor a painter, there's no other easy way to get that high (unless ... *ahem*).

--
Mathemeticians stand on each other's shoulders while computer scientists stand on each other's toes.
                -- Richard Hamming

posted at: 23:44 | path: /hacks | permalink | Tags: , ,

Some time in late 2002, I got to see a clear picture of what interpreter optimisation is all about. While I only wrote a small paragraph of the Design of the Portable.net Interpreter, I got a good look at some of the design decisions that went into pnet. The history of the CVM engine aside, more recently I started looking into the Php5 engine core interpreter loop. And believe me, it wasn't written with raw performance in mind.

The VM doesn't go either the register VM or stack VM way, there by throwing away years of optimisations which have gone into either. The opcode parameters are passed between opcodes in the ->result entry in each opcode and which are used as the op1 or op2 of the next opcode. You can literally see the tree of operations in this data structure. As much as it is good for data clarity, it means that every time I add two numbers, I write to a memory location somewhere. For example, I cannot persist some data in a register and have it picked up by the latter opcode - which is pretty easy to do with a stack VM.

Neither did I see any concept called verifiability, which means that I cannot predict output types or make any assumptions about them either. For example, the following is code for the add operation.

ZEND_ADD_SPEC_VAR_VAR_HANDLER:
{
    zend_op *opline = EX(opline);
    zend_free_op free_op1, free_op2;

    add_function(&EX_T(opline->result.u.var).tmp_var,
        _get_zval_ptr_var(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC),
        _get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC) TSRMLS_CC);
    if (free_op1.var) {zval_ptr_dtor(&free_op1.var);};
    if (free_op2.var) {zval_ptr_dtor(&free_op2.var);};
    ZEND_VM_NEXT_OPCODE();
}

Since we have no idea what type of zval is contained in the operators, the code has to do a set of conversion to number. All these operations involve basically a conditional jump somewhere (aka if) which are what we're supposed to be avoiding to speed up.

Neither could I registerify variables easily, because there was a stupid CALL based VM (which is flexible enough to do some weird hacks by replacing opcodes) which throws away all variables in every scope. That's some serious stack space churn, which I can't force anyone to re-do. At least, not yet. So inspite of having a CGOTO core, there was hardly anything I could do without breaking the CALL core codebase.

Basically, after I'd exhausted all my usual bag of tricks I looked a little closer at the assembly thrown out by the compiler. Maybe there was something that wasn't quite obvious happening in the engine.

.L1031:
    leal    -72(%ebp), %eax
    addl    $76, (%eax)
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
    movl    %eax, -2748(%ebp)
    jmp .L1194
.L203:
    leal    -72(%ebp), %eax
    addl    $76, (%eax)
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
    movl    %eax, -2748(%ebp)
    jmp .L1194
....
L1194: 
    jmp *-2748(%ebp)

As you can clearly see, the jump target is yet another jump instruction. For a pipelined CPU that's really bad news, especially when the jump is so long off. So I wrote up some assembly to remove the double jump and convert into a single one.

#ifdef __i386__
#define ZEND_VM_CONTINUE() do { __asm__ __volatile__ (\
        "jmp *%0" \
        :: "r" (EX(opline)->handler) ); \
    /* just to fool the compiler */ \
    goto * ((void **)(EX(opline)->handler)); } while(0)
#else
#define ZEND_VM_CONTINUE() goto *(void**)(EX(opline)->handler)
#endi

So in i386 land the jump is assembly code and marked volatile so that it will not be optimised or rearranged to be more "efficent".

.L1031:
    leal    -72(%ebp), %eax
    addl    $76, (%eax)
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
#APP
    jmp *%eax
#NO_APP
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
    movl    %eax, -2748(%ebp)
    jmp .L1194
.L203:
    leal    -72(%ebp), %eax
    addl    $76, (%eax)
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
#APP
    jmp *%eax
#NO_APP
    movl    -72(%ebp), %eax
    movl    (%eax), %eax
    movl    %eax, -2748(%ebp)
    jmp .L1194

The compiler requires a goto to actually realize it has to flush all the stack params inside the scope. I've learnt that fact a long time ago trying to do the same for dotgnu's amd64 unroller. Anyway, let's look at the numbers.

           
Before:                     After:
simple             0.579    simple             0.482
simplecall         0.759    simplecall         0.692
simpleucall        1.193    simpleucall        1.111
simpleudcall       1.409    simpleudcall       1.320
mandel             2.034    mandel             1.830
mandel2            2.551    mandel2            2.227
ackermann(7)       1.438    ackermann(7)       1.638
ary(50000)         0.100    ary(50000)         0.097
ary2(50000)        0.080    ary2(50000)        0.080
ary3(2000)         1.051    ary3(2000)         1.024
fibo(30)           3.914    fibo(30)           3.383
hash1(50000)       0.185    hash1(50000)       0.182
hash2(500)         0.209    hash2(500)         0.198
heapsort(20000)    0.616    heapsort(20000)    0.580
matrix(20)         0.500    matrix(20)         0.481
nestedloop(12)     0.953    nestedloop(12)     0.855
sieve(30)          0.499    sieve(30)          0.494
strcat(200000)     0.079    strcat(200000)     0.074
------------------------    ------------------------
Total             18.149    Total             16.750

This is in comparison to the default php5 core which takes a pathetic 23.583 to complete the tests. But there's more to the story. If you look carefully, you'll notice that there's a register indirection just before the move. But x86 does support an indirect indexed jump with a zero index.

   __asm__ __volatile__ ("jmp *(%0)",:: "r" (&(EX(opline)->handler))); 

That generates a nice jmp *(%eax); which is perfect enough for my purpose. Except for the fact that I can see in the assembly, the above fix didn't really do much for performance. For example, look at the following code :-

    leal    -72(%ebp), %eax
    addl    $76, (%eax)
#APP
    nop
#NO_APP
    movl    -72(%ebp), %eax
#APP
    jmp *(%eax)
#NO_APP

The EAX loader between the two custom asm statements is what I was trying to avoid. But the variable is re-loaded again from stack because there is no register variable cache for the handler pointer. One way around that is to do what pnet did, keep your PC (eqiv of handler var) in a register, preferably EBX and use it directly. The seperation between operands (stack) and operator (handler) makes it hard to optimize both in one go. The opline contains both together making it really really hard to properly speed up.

But there's this thing about me - I'm lazy.

--
Captain, we have lost entire left hamster section.
Now, pedal faster.

posted at: 17:55 | path: /php | permalink | Tags: , ,