Opened 11 years ago

Last modified 11 years ago

#384 assigned defect

Linux-files performance regression

Reported by: gz Owned by: gb
Priority: normal Milestone:
Component: Performance Version: working-0711
Keywords: Cc:

Description

(defun test-open (n)
  (dotimes (i n)
    (with-open-file (f "tmp.dat" :direction :output :if-exists :supersede)
      (write-string "(ho hum)" f))
    (with-open-file (f "tmp.dat")
      (read-char f))))

I ran this test function multiple times in working-0711 branch versions r11088 and r11089 and 11089 is consistently slower on average, e.g.

Form (TEST-OPEN 250) in Version 1.2-r11089-working-0711  (LinuxX8664)
    Elapsed        User      System          GC        Alloc   Minor   Major   Swaps
    257,019      95,986      36,994       1,319    2,896,000       0       0       0
    251,092      88,986      41,994       2,206    2,896,000       0       0       0
    134,011      72,989      54,992       1,145    2,896,000      32       0       0
    125,924      78,988      46,993       1,147    2,896,000       0       0       0
    130,112      79,988      48,992       2,295    2,896,000       0       0       0
    128,157      78,988      47,993       1,146    2,896,000       0       0       0
    216,794      77,988      50,992       1,147    2,896,000       0       0       0
    277,051      78,988      51,992       2,292    2,896,000       0       0       0
    255,427      77,988      51,992       1,158    2,896,000       0       0       0
    267,159      85,987      44,993       1,142    2,896,000       0       0       0
    253,114      90,986      39,994       1,152    2,896,000       0       0       0
    274,699      80,988      49,993       2,272    2,896,000       0       0       0
    163,449      80,988      46,993       1,158    2,896,000       0       0       0
    127,657      81,987      45,993       1,146    2,896,000       0       0       0
    128,554      81,988      46,992       2,305    2,896,000       0       0       0
    127,557      77,988      47,993       1,148    2,896,000       0       0       0
    127,986      68,990      57,991       1,137    2,896,000       0       0       0
    129,109      73,988      54,992       2,285    2,896,000       0       0       0
    330,468      72,989      56,991       1,141    2,896,000       0       0       0
    298,654      84,987      43,994       1,155    2,896,000       0       0       0
    261,672      86,987      43,993       2,309    2,896,000       0       0       0
    266,398      80,988      48,992       1,136    2,896,000       0       0       0
    264,178      85,987      43,994       8,139    2,896,000       0       0       0
    265,669      85,987      45,993       2,256    2,896,000       0       0       0
    273,692      79,987      47,993       1,132    2,896,000       0       0       0
    142,433      71,990      54,991       1,154    2,896,000       0       0       0
    128,478      84,987      42,994       2,288    2,896,000       0       0       0
    126,702      90,986      34,994       1,144    2,896,000       0       0       0
    127,519      70,989      55,992       1,127    2,896,000       0       0       0
    128,295      75,988      50,992       1,134    2,896,000       0       0       0
    129,247      88,987      39,994       2,269    2,896,000       0       0       0
    134,055      82,987      44,993       1,139    2,896,000       0       0       0
    127,377      76,989      50,992       1,146    2,896,000       0       0       0
    197,773      81,987      46,993       3,014    2,896,000       0       0       0
    259,379      83,987      46,993       9,137    2,896,000       0       0       0
    265,179      86,987      44,993       1,122    2,896,000       0       0       0
    274,282      66,990      64,990       2,260    2,896,000       0       0       0
    268,083      89,986      39,994       1,135    2,896,000       0       0       0
    127,310      74,989      51,992       1,134    2,896,000       0       0       0
    127,727      73,989      52,992       2,303    2,896,000       0       0       0

Averages in Version 1.2-r11089-working-0711  (LinuxX8664)
    197,486      80,888      48,143       1,909    2,896,000       1       0       0
Form (TEST-OPEN 250) in Version 1.2-r11088-working-0711  (LinuxX8664)
    Elapsed        User      System          GC        Alloc   Minor   Major   Swaps
    129,391      75,988      52,992       1,391    2,896,000       0       0       0
    131,708      81,988      46,992       2,228    2,896,000       0       0       0
    131,589      84,987      46,993       1,118    2,896,000       0       0       0
    127,003      76,988      49,993       1,168    2,896,000       0       0       0
    135,771      83,987      50,992       2,317    2,896,000       0       0       0
    129,387      74,989      53,992       1,114    2,896,000       0       0       0
    133,612      84,987      46,993       1,112    2,896,000       0       0       0
    130,583      95,985      33,994       2,273    2,896,000       0       0       0
    129,508      71,989      57,992       1,112    2,896,000       0       0       0
    136,809      89,987      37,994       2,239    2,896,000       0       0       0
    130,573      88,986      41,994       1,160    2,896,000       0       0       0
    130,016      89,987      38,994       1,162    2,896,000       0       0       0
    129,686      90,986      38,994       2,214    2,896,000       0       0       0
    130,396      78,988      49,992       1,121    2,896,000       0       0       0
    128,936      80,987      47,993       1,119    2,896,000       0       0       0
    130,901      85,987      44,993       2,214    2,896,000       0       0       0
    136,721      92,986      36,994       1,105    2,896,000       0       0       0
    129,541      78,988      49,993       2,237    2,896,000       0       0       0
    131,080      79,988      48,992       1,105    2,896,000       0       0       0
    129,769      84,987      43,994       1,151    2,896,000       0       0       0
    129,451      85,987      42,993       2,224    2,896,000       0       0       0
    129,559      84,987      44,993       1,104    2,896,000       0       0       0
    128,017      82,987      44,993       1,105    2,896,000       0       0       0
    136,464      90,987      38,994       2,261    2,896,000       0       0       0
    128,916      81,987      46,993       1,110    2,896,000       0       0       0
    131,120      86,987      43,994       2,220    2,896,000       0       0       0
    127,700      73,989      53,991       1,163    2,896,000      32       0       0
    128,964      84,987      43,994       1,151    2,896,000       0       0       0
    128,882      78,988      49,992       1,195    2,896,000       0       0       0
    131,082      73,989      56,991       2,286    2,896,000       0       0       0
    128,631      86,986      41,994       1,155    2,896,000       0       0       0
    130,047      72,989      56,991       1,153    2,896,000       0       0       0
    130,789      89,987      40,994       2,356    2,896,000       0       0       0
    129,741      72,988      55,992       1,186    2,896,000       0       0       0
    128,347      93,986      34,994       1,155    2,896,000       0       0       0
    129,799      85,987      42,994       2,278    2,896,000       0       0       0
    174,095      75,989      54,991       1,146    2,896,000       0       0       0
    253,844      81,987      46,993       1,155    2,896,000       0       0       0
    275,874      79,988      53,992       1,133    2,896,000       0       0       0
    271,273      91,986      42,994       2,282    2,896,000       0       0       0

Averages in Version 1.2-r11088-working-0711  (LinuxX8664)
    141,889      83,337      46,568       1,537    2,896,000       1       0       0

Note that 11089 is comparable to 11088 in some runs, but it has a lot more cases where the elapsed time almost doubles, and that's what brings up its average.

Elsewhere, Greg has isolated the problem to the syscall changes in linux-files.lisp, and Gary has identified a possible suspect in C library glue; I haven't tried either against this test case yet, wanted to get it recorded first.

Change History (4)

comment:1 Changed 11 years ago by gb

  • Status changed from new to assigned

Among other things, in 11088 the GC times follow a more or less consistent pattern of ~1100 units, ~1100 units, ~2200; there's a little variance there, but it looks like we do cheap GC for a while, then a more expensive one, then cheap again ...

11089 sort of follows the same pattern, but in -some- periods where it's being slow there are some outliers of >8000 units of GC time.

There isn't an obvious correlation between high GC time and general slowness, but the outliers are a little mysterious.

comment:2 Changed 11 years ago by gb

Well, none of this is really conclusive, but:

Welcome to Clozure Common Lisp Version 1.2-r11089-working-0711  (LinuxX8664)!
? (time (dotimes (i 50) (test-open 250)))
(DOTIMES (I 50) (TEST-OPEN 250)) took 5,945,262 microseconds (5.945262 seconds) to run 
                    with 2 available CPU cores.
During that period, 3,370,487 microseconds (3.370487 seconds) were spent in user mode
                    2,566,610 microseconds (2.566610 seconds) were spent in system mode
73,387 microseconds (0.073387 seconds) was spent in GC.
 162,400,000 bytes of memory allocated.
 34 minor page faults, 0 major page faults, 0 swaps.
 Welcome to Clozure Common Lisp Version 1.2-r11088-working-0711  (LinuxX8664)!
? (time (dotimes (i 50) (test-open 250)))
(DOTIMES (I 50) (TEST-OPEN 250)) took 5,837,217 microseconds (5.837217 seconds) to run 
                    with 2 available CPU cores.
During that period, 3,228,510 microseconds (3.228510 seconds) were spent in user mode
                    2,599,605 microseconds (2.599605 seconds) were spent in system mode
73,970 microseconds (0.073970 seconds) was spent in GC.
 162,400,000 bytes of memory allocated.
 33 minor page faults, 0 major page faults, 0 swaps.
NIL

In that comparison, r11088 took about 98% of the time that r11089 took. That's actually more of a difference than I'd expect, but it's not nearly as much as gz and the customer are seeing. Some environmental differences that might explain the different results:

  • This is a Core 2 Duo machine; clozure.com is an AMD Athlon, and some other machines that have exhibited the problem are AMD Opterons.
  • These results were obtained on Fedora 10 (instead of Fedora 6 or ... something very old.)
  • There's practically nothing else running on the machine I tested on.
  • Both of the images I was using had ELF symbols/frozen heap, and that might have affected something.
  • How much consing happens (and how much copying of strings/pathname components goes on) in this test is obviously affected by what the current directory is. Both images were running in working directories whose pathnames had the same length.
  • Something else I'm not thinking of.

I'll see if I can reproduce results that show the variance that gz saw.

I had saved a couple of sets of oprofie info, but I'm not sure how useful they'll be, since they seemed to show that r11089 was slightly faster in those runs.

comment:3 Changed 11 years ago by gb

I've tried running this on clozure.com. Here's some oprofile output, first for r11088:

CPU: AMD64 processors, speed 2194.55 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
    83505 31.9835 no-vmlinux
    76687 29.3721 libpython2.4.so.1.0
    66415 25.4378 LX86CL64-11088
    15837  6.0658 lx86cl64
	CPU_CLK_UNHALT...|
	  samples|      %|
	------------------
	    15387 97.1586 lx86cl64
	      450  2.8414 anon (tgid:5673 range:0x300040e00000-0x300042e00000)
    10816  4.1427 libc-2.5.so

And for the same test - (DOTIMES (I 50) (TEST-OPEN 250)) - in r11089 :

CPU: AMD64 processors, speed 2194.55 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
    65678 43.3521 LX86CL64-11089
    60548 39.9659 no-vmlinux
    16017 10.5723 lx86cl64
	CPU_CLK_UNHALT...|
	  samples|      %|
	------------------
	    15575 97.2404 lx86cl64
	      442  2.7596 anon (tgid:5183 range:0x300040e06000-0x300042e20000)
     3325  2.1947 libc-2.5.so

That happens to be backwards from the results that we expect, and we can clearly see that it's that libpython thing that's slowing down r11088. Of course, the time spent in libpython is incidental; clozure.com isn't a particularly good environment for testing this sort of thing, since things like GNU Mailman (which is a Python program) can run at unpredictable times, causing things that we're trying to profile to take more wall-clock time without taking more CPU time.

I tried a few more times, and each time (as bad luck would have it) some random python program ran when I was trying to profile r11088.

I assume that we actually see a performance regression in a much more controlled environment. So far, I haven't seen that, and "some random program running while profiling" certainly can lead to longer real times without changes in CPU time.

comment:4 Changed 11 years ago by gz

I don't know if the other environment is more controlled, but so far the regression has been reliably reproducible there, so there is something going on. Unfortunately, you're right, this test case doesn't actually have the same reproducibility. I'll keep looking.

Note: See TracTickets for help on using tickets.