Computers Suck, episode MCMXXLIV
Jan. 16th, 2003 11:16 pmI did send off a cover letter and a networking ping. That was something productive.
I don't understand why my little loop is missing the cache. Assuming it is missing the cache, and what else could it be doing for 500 cycles considering the cache-miss-unaware CPU simulator thinks it takes less than 50? But the whole data set should fit in L1, and I've tried preloading it in immediately before, and it's still 500 cycles. So I wanted to confirm that it is in fact missing.
AMD's tool can access the performance counters, which can be set up to count cache misses among other things. But the tutorial is totally opaque. Enter 0x1c into this box, enter 3 into that, press Run, get a table of a lot of numbers. What did I just do? What are these numbers anyway; why don't the counters count upwards? The icing is that it crashes on my code. (Conceivably it's exercising a genuine bug, but I can't see how.)
I did the little registration dance to get a trial copy of Intel's VTune [insert half-hour download and obligatory reboot], which ought to support Intel performance counters, but I can't see how to get at them. I'm not at all sure the Athlon's are backward-compatible with p6 on this, anyway.
Okay, what if I bracket this stretch of my code with asm to set up and read the performance counter myself? I find lots of libraries to do this, but they're all for Linux. Fine, hack it up myself. Now I can't find a freakin' Athlon instruction set reference on AMD's site. Well, the web has the necessary bits and pieces. Oops, they tell me I have to be ring 0 to set this stuff up.
So now I'm doing a device driver. For which I need the WinXP DDK. The DDK, oh seriously non-frabjous day, is no longer downloadable. But! You can order it on CD for free. But! More specifically that's free plus $15 S/H. H A T E
I found a stale copy of the Win2K DDK, which I bet will work. But I'm not looking at it until tomorrow.
Bowling was fun. We almost torpedoed it by leaving our phone off the hook all day, but not quite.
I don't understand why my little loop is missing the cache. Assuming it is missing the cache, and what else could it be doing for 500 cycles considering the cache-miss-unaware CPU simulator thinks it takes less than 50? But the whole data set should fit in L1, and I've tried preloading it in immediately before, and it's still 500 cycles. So I wanted to confirm that it is in fact missing.
AMD's tool can access the performance counters, which can be set up to count cache misses among other things. But the tutorial is totally opaque. Enter 0x1c into this box, enter 3 into that, press Run, get a table of a lot of numbers. What did I just do? What are these numbers anyway; why don't the counters count upwards? The icing is that it crashes on my code. (Conceivably it's exercising a genuine bug, but I can't see how.)
I did the little registration dance to get a trial copy of Intel's VTune [insert half-hour download and obligatory reboot], which ought to support Intel performance counters, but I can't see how to get at them. I'm not at all sure the Athlon's are backward-compatible with p6 on this, anyway.
Okay, what if I bracket this stretch of my code with asm to set up and read the performance counter myself? I find lots of libraries to do this, but they're all for Linux. Fine, hack it up myself. Now I can't find a freakin' Athlon instruction set reference on AMD's site. Well, the web has the necessary bits and pieces. Oops, they tell me I have to be ring 0 to set this stuff up.
So now I'm doing a device driver. For which I need the WinXP DDK. The DDK, oh seriously non-frabjous day, is no longer downloadable. But! You can order it on CD for free. But! More specifically that's free plus $15 S/H. H A T E
I found a stale copy of the Win2K DDK, which I bet will work. But I'm not looking at it until tomorrow.
Bowling was fun. We almost torpedoed it by leaving our phone off the hook all day, but not quite.