not my sock 04 Oct 2005

Tue, 04 Oct 2005:

Python speed-tracer code

This is some code I re-did for lawgon, while on IRC. This one dates to the time I was trying to get BitTorrent to tunnel over HTTP and ran into trouble figuring out which function is called by what. This is not as good as the original version - but this is quite good if you want to debug python without putting too many print statements :)

import sys 

def tracer (frame, event, arg):
    if event == "call":
        print "%s() --> from %s:%d" % (frame.f_code.co_name, frame.f_back.f_code.co_filename, frame.f_back.f_lineno)

def fact(i):
    if(i == 1):
        return 1
    return fact(i-1) * i

sys.settrace(tracer)

def main():
    print fact(9)

This is by no means original code and shouldn't be misunderstood to be my invention. If you want, you could try playing around with frame.f_globals["__file__"] or __module__. I think you could even hack in linecache to build a debugger of sorts. Python rocks !!

--
* If you see this blog rockin', don't come a knockin'

posted at: 18:43 | path: /hacks | permalink | Tags: public

AMD64 Unroller under construction

Over the weekend, me and abhi hacked up the unroller for AMD64 instruction set. The instruction set is very similar to the x86 - so we just took the x86 unroller and are slowly pushing our way into modding it to work for 64 bit. md_amd64.h is already in CVS. The unroller is not enabled yet.

I am still having problems with some (int) casts in the code-base. AMD64 has 32 bit integers, 64 bit words and 64 bit pointers. Also the byte addressing logic is absolutely screwed, which makes it very hard to access a byte[] array properly without needing to waste space on padding. The register allocation ordering was swapped around to -

The RBP was promoted up the order, though not really made use of properly. Ideally, the effect of that will kick in when we have the array opcodes done up. The benchmarks are promising already. Pnetmark scores are up from 1608 to 4210 after just two days of work (very very hard work).

I think it'll take around a week more to finish the unroller. This is the advantage that full JITs don't have - I can already test out the unroller fully by just commenting out the TODO areas and letting the interpreter handle those opcodes. Mixed mode execution is the holy grail of easy optimisation.

The null check elimination has to be re-done for AMD64. The system relies on catching the SIGSEGV, instead of doing an if(NULL ==) check, using signals and using sys/ucontext.h to figure out the registers. From that we work backward to find the the exception handler and jump there. Amazingly complicated code from tum, but works like a wonder. I need to understand how that works before I can re-implement it for this new CPU. It does wonders for your object access code - makes it faster than C with null checks.

The FPU code fox x87 would work for amd64 as well, I suppose. Might not need too many mods - far less compared to the other stuff that uses void * arithmetic. I'm tired of pushing bits right, right, right and then adding the regs. X86 is teh suck for us binary programmers.

--
If code is poetry, I write limericks.

posted at: 11:10 | path: /dotgnu | permalink | Tags: public

<	October 2005					>
Su	Mo	Tu	We	Th	Fr	Sa
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31