Opened 12 years ago
Last modified 12 years ago
#384 assigned defect
Linux-files performance regression
Reported by: | gz | Owned by: | gb |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Performance | Version: | working-0711 |
Keywords: | Cc: |
Description
(defun test-open (n) (dotimes (i n) (with-open-file (f "tmp.dat" :direction :output :if-exists :supersede) (write-string "(ho hum)" f)) (with-open-file (f "tmp.dat") (read-char f))))
I ran this test function multiple times in working-0711 branch versions r11088 and r11089 and 11089 is consistently slower on average, e.g.
Form (TEST-OPEN 250) in Version 1.2-r11089-working-0711 (LinuxX8664) Elapsed User System GC Alloc Minor Major Swaps 257,019 95,986 36,994 1,319 2,896,000 0 0 0 251,092 88,986 41,994 2,206 2,896,000 0 0 0 134,011 72,989 54,992 1,145 2,896,000 32 0 0 125,924 78,988 46,993 1,147 2,896,000 0 0 0 130,112 79,988 48,992 2,295 2,896,000 0 0 0 128,157 78,988 47,993 1,146 2,896,000 0 0 0 216,794 77,988 50,992 1,147 2,896,000 0 0 0 277,051 78,988 51,992 2,292 2,896,000 0 0 0 255,427 77,988 51,992 1,158 2,896,000 0 0 0 267,159 85,987 44,993 1,142 2,896,000 0 0 0 253,114 90,986 39,994 1,152 2,896,000 0 0 0 274,699 80,988 49,993 2,272 2,896,000 0 0 0 163,449 80,988 46,993 1,158 2,896,000 0 0 0 127,657 81,987 45,993 1,146 2,896,000 0 0 0 128,554 81,988 46,992 2,305 2,896,000 0 0 0 127,557 77,988 47,993 1,148 2,896,000 0 0 0 127,986 68,990 57,991 1,137 2,896,000 0 0 0 129,109 73,988 54,992 2,285 2,896,000 0 0 0 330,468 72,989 56,991 1,141 2,896,000 0 0 0 298,654 84,987 43,994 1,155 2,896,000 0 0 0 261,672 86,987 43,993 2,309 2,896,000 0 0 0 266,398 80,988 48,992 1,136 2,896,000 0 0 0 264,178 85,987 43,994 8,139 2,896,000 0 0 0 265,669 85,987 45,993 2,256 2,896,000 0 0 0 273,692 79,987 47,993 1,132 2,896,000 0 0 0 142,433 71,990 54,991 1,154 2,896,000 0 0 0 128,478 84,987 42,994 2,288 2,896,000 0 0 0 126,702 90,986 34,994 1,144 2,896,000 0 0 0 127,519 70,989 55,992 1,127 2,896,000 0 0 0 128,295 75,988 50,992 1,134 2,896,000 0 0 0 129,247 88,987 39,994 2,269 2,896,000 0 0 0 134,055 82,987 44,993 1,139 2,896,000 0 0 0 127,377 76,989 50,992 1,146 2,896,000 0 0 0 197,773 81,987 46,993 3,014 2,896,000 0 0 0 259,379 83,987 46,993 9,137 2,896,000 0 0 0 265,179 86,987 44,993 1,122 2,896,000 0 0 0 274,282 66,990 64,990 2,260 2,896,000 0 0 0 268,083 89,986 39,994 1,135 2,896,000 0 0 0 127,310 74,989 51,992 1,134 2,896,000 0 0 0 127,727 73,989 52,992 2,303 2,896,000 0 0 0 Averages in Version 1.2-r11089-working-0711 (LinuxX8664) 197,486 80,888 48,143 1,909 2,896,000 1 0 0
Form (TEST-OPEN 250) in Version 1.2-r11088-working-0711 (LinuxX8664) Elapsed User System GC Alloc Minor Major Swaps 129,391 75,988 52,992 1,391 2,896,000 0 0 0 131,708 81,988 46,992 2,228 2,896,000 0 0 0 131,589 84,987 46,993 1,118 2,896,000 0 0 0 127,003 76,988 49,993 1,168 2,896,000 0 0 0 135,771 83,987 50,992 2,317 2,896,000 0 0 0 129,387 74,989 53,992 1,114 2,896,000 0 0 0 133,612 84,987 46,993 1,112 2,896,000 0 0 0 130,583 95,985 33,994 2,273 2,896,000 0 0 0 129,508 71,989 57,992 1,112 2,896,000 0 0 0 136,809 89,987 37,994 2,239 2,896,000 0 0 0 130,573 88,986 41,994 1,160 2,896,000 0 0 0 130,016 89,987 38,994 1,162 2,896,000 0 0 0 129,686 90,986 38,994 2,214 2,896,000 0 0 0 130,396 78,988 49,992 1,121 2,896,000 0 0 0 128,936 80,987 47,993 1,119 2,896,000 0 0 0 130,901 85,987 44,993 2,214 2,896,000 0 0 0 136,721 92,986 36,994 1,105 2,896,000 0 0 0 129,541 78,988 49,993 2,237 2,896,000 0 0 0 131,080 79,988 48,992 1,105 2,896,000 0 0 0 129,769 84,987 43,994 1,151 2,896,000 0 0 0 129,451 85,987 42,993 2,224 2,896,000 0 0 0 129,559 84,987 44,993 1,104 2,896,000 0 0 0 128,017 82,987 44,993 1,105 2,896,000 0 0 0 136,464 90,987 38,994 2,261 2,896,000 0 0 0 128,916 81,987 46,993 1,110 2,896,000 0 0 0 131,120 86,987 43,994 2,220 2,896,000 0 0 0 127,700 73,989 53,991 1,163 2,896,000 32 0 0 128,964 84,987 43,994 1,151 2,896,000 0 0 0 128,882 78,988 49,992 1,195 2,896,000 0 0 0 131,082 73,989 56,991 2,286 2,896,000 0 0 0 128,631 86,986 41,994 1,155 2,896,000 0 0 0 130,047 72,989 56,991 1,153 2,896,000 0 0 0 130,789 89,987 40,994 2,356 2,896,000 0 0 0 129,741 72,988 55,992 1,186 2,896,000 0 0 0 128,347 93,986 34,994 1,155 2,896,000 0 0 0 129,799 85,987 42,994 2,278 2,896,000 0 0 0 174,095 75,989 54,991 1,146 2,896,000 0 0 0 253,844 81,987 46,993 1,155 2,896,000 0 0 0 275,874 79,988 53,992 1,133 2,896,000 0 0 0 271,273 91,986 42,994 2,282 2,896,000 0 0 0 Averages in Version 1.2-r11088-working-0711 (LinuxX8664) 141,889 83,337 46,568 1,537 2,896,000 1 0 0
Note that 11089 is comparable to 11088 in some runs, but it has a lot more cases where the elapsed time almost doubles, and that's what brings up its average.
Elsewhere, Greg has isolated the problem to the syscall changes in linux-files.lisp, and Gary has identified a possible suspect in C library glue; I haven't tried either against this test case yet, wanted to get it recorded first.
Change History (4)
comment:1 Changed 12 years ago by gb
- Status changed from new to assigned
comment:2 Changed 12 years ago by gb
Well, none of this is really conclusive, but:
Welcome to Clozure Common Lisp Version 1.2-r11089-working-0711 (LinuxX8664)! ? (time (dotimes (i 50) (test-open 250))) (DOTIMES (I 50) (TEST-OPEN 250)) took 5,945,262 microseconds (5.945262 seconds) to run with 2 available CPU cores. During that period, 3,370,487 microseconds (3.370487 seconds) were spent in user mode 2,566,610 microseconds (2.566610 seconds) were spent in system mode 73,387 microseconds (0.073387 seconds) was spent in GC. 162,400,000 bytes of memory allocated. 34 minor page faults, 0 major page faults, 0 swaps.
Welcome to Clozure Common Lisp Version 1.2-r11088-working-0711 (LinuxX8664)! ? (time (dotimes (i 50) (test-open 250))) (DOTIMES (I 50) (TEST-OPEN 250)) took 5,837,217 microseconds (5.837217 seconds) to run with 2 available CPU cores. During that period, 3,228,510 microseconds (3.228510 seconds) were spent in user mode 2,599,605 microseconds (2.599605 seconds) were spent in system mode 73,970 microseconds (0.073970 seconds) was spent in GC. 162,400,000 bytes of memory allocated. 33 minor page faults, 0 major page faults, 0 swaps. NIL
In that comparison, r11088 took about 98% of the time that r11089 took. That's actually more of a difference than I'd expect, but it's not nearly as much as gz and the customer are seeing. Some environmental differences that might explain the different results:
- This is a Core 2 Duo machine; clozure.com is an AMD Athlon, and some other machines that have exhibited the problem are AMD Opterons.
- These results were obtained on Fedora 10 (instead of Fedora 6 or ... something very old.)
- There's practically nothing else running on the machine I tested on.
- Both of the images I was using had ELF symbols/frozen heap, and that might have affected something.
- How much consing happens (and how much copying of strings/pathname components goes on) in this test is obviously affected by what the current directory is. Both images were running in working directories whose pathnames had the same length.
- Something else I'm not thinking of.
I'll see if I can reproduce results that show the variance that gz saw.
I had saved a couple of sets of oprofie info, but I'm not sure how useful they'll be, since they seemed to show that r11089 was slightly faster in those runs.
comment:3 Changed 12 years ago by gb
I've tried running this on clozure.com. Here's some oprofile output, first for r11088:
CPU: AMD64 processors, speed 2194.55 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 83505 31.9835 no-vmlinux 76687 29.3721 libpython2.4.so.1.0 66415 25.4378 LX86CL64-11088 15837 6.0658 lx86cl64 CPU_CLK_UNHALT...| samples| %| ------------------ 15387 97.1586 lx86cl64 450 2.8414 anon (tgid:5673 range:0x300040e00000-0x300042e00000) 10816 4.1427 libc-2.5.so
And for the same test - (DOTIMES (I 50) (TEST-OPEN 250)) - in r11089 :
CPU: AMD64 processors, speed 2194.55 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 65678 43.3521 LX86CL64-11089 60548 39.9659 no-vmlinux 16017 10.5723 lx86cl64 CPU_CLK_UNHALT...| samples| %| ------------------ 15575 97.2404 lx86cl64 442 2.7596 anon (tgid:5183 range:0x300040e06000-0x300042e20000) 3325 2.1947 libc-2.5.so
That happens to be backwards from the results that we expect, and we can clearly see that it's that libpython thing that's slowing down r11088. Of course, the time spent in libpython is incidental; clozure.com isn't a particularly good environment for testing this sort of thing, since things like GNU Mailman (which is a Python program) can run at unpredictable times, causing things that we're trying to profile to take more wall-clock time without taking more CPU time.
I tried a few more times, and each time (as bad luck would have it) some random python program ran when I was trying to profile r11088.
I assume that we actually see a performance regression in a much more controlled environment. So far, I haven't seen that, and "some random program running while profiling" certainly can lead to longer real times without changes in CPU time.
comment:4 Changed 12 years ago by gz
I don't know if the other environment is more controlled, but so far the regression has been reliably reproducible there, so there is something going on. Unfortunately, you're right, this test case doesn't actually have the same reproducibility. I'll keep looking.
Among other things, in 11088 the GC times follow a more or less consistent pattern of ~1100 units, ~1100 units, ~2200; there's a little variance there, but it looks like we do cheap GC for a while, then a more expensive one, then cheap again ...
11089 sort of follows the same pattern, but in -some- periods where it's being slow there are some outliers of >8000 units of GC time.
There isn't an obvious correlation between high GC time and general slowness, but the outliers are a little mysterious.