I recently discovered an easy way to inspect php files. So xdebug has a cvs module in their cvs called vle. This prints out the bytecode generated for php code. This lets me actually look at the bytecode generated for a particular php data or control structure. This extension is shining a bright light into an otherwise dark world of the ZendEngine2.
Let me pick on my favourite example of mis-optimisation that people use in php land - the HereDoc. People use heredoc very heavily and over the more mundane ways of putting strings in a file, like the double quoted world of the common man. Some php programmers even take special pride in the fact that they use heredocs rather than use quoted strings. Most C programmers use it to represent multi-line strings, not realizing php quoted strings can span lines.
<?php echo <<<EOF Hello World EOF; ?>
Generates some real ugly, underoptimised and really bad bytecode. Don't believe me, just look at what the vle dump looks like.
line # op ext operands ------------------------------------------------- 3 0 INIT_STRING ~0 1 ADD_STRING ~0, ~0, '%09' 2 ADD_STRING ~0, ~0, 'Hello' 3 ADD_STRING ~0, ~0, '+' 4 ADD_STRING ~0, ~0, 'World' 4 5 ADD_STRING ~0, ~0, '' 6 ECHO ~0
That's right, every single word is seperately appended to a new string and after all the appends with their corresponding reallocs, the string is echoed and thrown away. A really wasteful operation, right ? Well, it is unless you run it through APC's peephole add_string optimizer.
Or the other misleading item in the arsenal, constant arrays. I see hundreds of people use php arrays in include files to speed up the code, which does indeed work. But a closer look at the array code shows a few chinks which can actually be fixed in APC land.
<?php $a = array("x" => "z", "a" => "b", "b" => "c", "c" => "d"); ?>
Generating the following highly obvious result. Though it must be said that these are hardly different from what most other VMs store in bytecode, they are limited by the fact that they have to actually write the code (minus pointers) to a file. But Zend is completely in memory and could've had a memory organization for these arrays (which would've segv'd apc months before I ran into the default array args issue).
line # op ext operands ----------------------------------------------------------- 2 0 INIT_ARRAY ~0, 'z', 'x' 3 1 ADD_ARRAY_ELEMENT ~0, 'b', 'a' 4 2 ADD_ARRAY_ELEMENT ~0, 'c', 'b' 5 3 ADD_ARRAY_ELEMENT ~0, 'd', 'c' 4 ASSIGN !0, ~0 7 5 RETURN 1 6 ZEND_HANDLE_EXCEPTION
This still isn't optimized by APC and I think I'll do it sometime soon. After all, I just need to virtually execute the array additions and cache the resulting hash as the operand of the assign instead of going through this stupidity everytime it is executed.
Like rhysw said, "Make it work, then make it work better".
--Organizations can grow faster than their brains can manage them.
-- The Brontosaurus Principle