Opened 9 years ago

Closed 7 years ago

#761 closed enhancement (notabug)

libc not found

Reported by: vanekl Owned by:
Priority: normal Milestone:
Component: Foreign Function Interface Version: trunk
Keywords: Cc:

Description

After installing Debian Lenny (stable) and the latest version of CCL-trunk and usocket, I get an error if I run (usocket:get-host-name) because #_gethostname is not found.

This is because libc.so is not found in directory /lib. When I add a link to libc.so in /lib then all is well and it works. There /is/ a link to libc.so in the /usr/lib directory but ccl ignores it.

I'm not sure whether ccl's goal is to be newbie-friendly, but if it is I think it would be wise to look for libc in all the usual places (/lib /usr/lib) on linux.

Thanks

Change History (13)

comment:1 Changed 9 years ago by rme

  • Component changed from Compiler to Foreign Function Interface

I'm not sure why that's happening.

We just dlopen() the symbol and call it. We don't look for libc.so on any particular set of directories.

comment:2 Changed 9 years ago by vanekl

I don't have /lib or /usr/lib in my PATH and LD_LIBRARY_PATH is not set, so I don't think it's my environment that is causing the confusion.

dlopen takes a path as an argument:

void *dlopen(const char *file, int mode);

Are you referring to a different version of dlopen? You can't call a function without opening the library up first, and that requires a path.

And this is how dlopen is being used by ccl in file pmcl-kernel.c. Notice the path argument.

void * xGetSharedLibrary(char *path, int mode) {

return dlopen(path, mode);

}

If you can think of any (simple) tests that would help diagnose this problem I would be glad to run them.

comment:3 Changed 9 years ago by rme

Sorry, I meant to write dlsym().

Does this little program work?

/* foo.c: compile with cc foo.c -ldl */
#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main()
{
    int (*f)(char *, size_t) = dlsym(RTLD_NEXT, "gethostname");
    int i;
    char name[256];

    printf("f = %p\n", f);

    if (f)
        i = f(name, 256);
    else
       exit(1);

    if (i == 0) {
        printf("hostname = %s\n", name);
    } else {
        perror("gethostname");
    }
}

comment:4 Changed 9 years ago by gb

CCL doesn't "look for libc"; the dynamic linker does. (E.g., which version of libc is used is determined by the dynamic linker and by tools like "ldconfig".) "libc" differs from some other libraries in that it's already open before any CCL code runs.

On most Linux systems, the 'ldd' program can be used to determine the pathnames where the libraries that an executable file references will be found when that executable runs. ldd may generate output something like:

[src/ccl-1.5] gb@rinpoche> cat /etc/issue
Fedora release 13 (Goddard)
Kernel \r on an \m (\l)
[src/ccl-dev] gb@rinpoche> ldd lx86cl
	linux-gate.so.1 =>  (0xf7747000)
	libdl.so.2 => /lib/libdl.so.2 (0xf7729000)
	libm.so.6 => /lib/libm.so.6 (0xf76ff000)
	libpthread.so.0 => /lib/libpthread.so.0 (0xf76e4000)
	libc.so.6 => /lib/libc.so.6 (0x003c2000)
	/lib/ld-linux.so.2 (0x003a0000)

which is saying that (on a Fedora system and with its current dynamic linker configuration) the library "libc.so.6" will be found in /lib. The only library whose pathname appears in the executable is the dynamic linker itself ("/lib/ld-linux.so.2").

Running ldd on the same binary produces different results on a different distribution:

[src/ccl-1.5] gb@coleman> cat /etc/issue
Ubuntu 10.04.1 LTS \n \l
[src/ccl-1.5] gb@coleman> ldd lx86cl
	linux-gate.so.1 =>  (0x00f68000)
	libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0x00f7d000)
	libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0x0025d000)
	libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0x00f45000)
	libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x009dc000)
	/lib/ld-linux.so.2 (0x00e97000)

The "ldconfig" program maintains a complicated database of rules which map unqualified library names ("libc.so.6") to pathnames of installed libraries. I don't know what the real issue is in your case, but if the rules are causing the wrong version of libc to be found, trying to fix the problem by manually creating symbolic links can mask the real problem and make real fixes harder to implement.

"ldconfig" is hard to use; it's usually only invoked by the distribution's package management system and other low-level tools.

The value of the "LD_LIBRARY_PATH" variable also affects this, as could search directories whose names can be built into the exectuble (but aren't in CCL's case).

Hardwiring non-standard library paths into some part of CCL would indeed be an extraordinarily stupid thing to do; thankfully, that couldn't work, and whatever's going on here is a bit more subtle than that.

Some of those subtle things may involve library/symbol version information. If you haven't built the lisp kernel yourself, then you're likely using a lisp kernel that rme built (possibly on a Fedora system.) That should work on any reasonably modern Linux distribution, but the means by which it works (and the means by which my Ubuntu system concludes that compatible libraries should be found in /lib/tls/i686/cmov/) is complicated. You might try building the lisp kernel yourself and seeing if that affects ldd's output; this may sound too obvious to be useful, but if you've just done a distribution upgrade, you generally need to reboot in order to fully use the new system. (One of the things that typically happens during reboot is that the ldconfig cache gets rebuilt.)

However this gets resolved, I'd strongly advise you to remove the links that you created; even if they appear to fix the problem, they may in fact just confuse the issue.

I don't think that this is "a CCL problem" in anything like the way that you seem to assume it is, though if there's anything that we can do to avoid the problem, I'd certainly want to do that. At this point, I'm not sure that I understand what the problem is.

If rme's example program works, I'd guess that recompiling the lisp kernel might also work to cause the static and dynamic linkers to conspire to find the right version of libc for your distribution.

(By the way, hostility towards newbies only starts the second time that they misdiagnose a problem and deliver a lecture based on that misdiagnosis. The first time's on the house !)

comment:5 Changed 9 years ago by vanekl

The C program runs as expected. (No problems.)

After deleting the new libc link (as discussed above) and rebooting, there is no problem with _gethostname any more. I guess this was just a one-shot error. I think it's safe to close this ticket.

Sorry for the opening the ticket. I should have rebooted and tried to reproduce the error first.

Thanks for the ldd tip.

comment:6 Changed 9 years ago by rme

  • Resolution set to invalid
  • Status changed from new to closed

comment:7 Changed 7 years ago by avodonosov

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:8 follow-up: Changed 7 years ago by avodonosov

I also have this problem. Reopening the ticket to post the information I have.

I am trying to run CCL at heroku. The official CCL 1.8 binary (ftp://ftp.clozure.com/pub/release/1.8/ccl-1.8-linuxx86.tar.gz) when downloaded to heroku and performing (ql:quickload :usocket) fails with error about not found foreign function gethostname.

When I have more info I post here.

If you want me to run some command (objdump, etc), let me know.

comment:9 in reply to: ↑ 8 Changed 7 years ago by rme

Replying to avodonosov:

I also have this problem. Reopening the ticket to post the information I have.

I am trying to run CCL at heroku. The official CCL 1.8 binary (ftp://ftp.clozure.com/pub/release/1.8/ccl-1.8-linuxx86.tar.gz) when downloaded to heroku and performing (ql:quickload :usocket) fails with error about not found foreign function gethostname.

When I have more info I post here.

If you want me to run some command (objdump, etc), let me know.

OK. We'll need a little more to go on. If nothing else, it might help if could paste the exact error message.

comment:10 Changed 7 years ago by avodonosov

I had several Heroku apps and observed this problem several times, but of all of the apps I now have only one where the problem reproduces. The error description is below. It is CCL 1.7. If I get more info, I'll post later

$ ccl/lx86cl64 Welcome to Clozure Common Lisp Version 1.7-r14925M (LinuxX8664)! ? (defun get-host-name ()

(ccl::%stack-block ((resultbuf 256))

(when (zerop (#_gethostname resultbuf 256))

(ccl::%get-cstring resultbuf))))

; Warning: Interface file #P"/var/lib/buildbot/ccl/linuxx8664-1.7/build/x86-headers64/libc/ccl/functions.cdb" does not exist, and the containing directory does not exist. ; This may mean that that the "ccl:" logical-pathname host has not been properly initialized. ; While executing: CCL::CDB-OPEN, in process listener(1).

Error: Foreign function not found: X86-LINUX64::|gethostname| While executing: CCL::LOAD-EXTERNAL-FUNCTION, in process listener(1). Type :POP to abort, :R for a list of available restarts. Type :? for other options.

1 > (ccl::%get-cstring resultbuf))))

Error: Unbound variable: RESULTBUF While executing: CCL::CHEAP-EVAL-IN-ENVIRONMENT, in process listener(1). Type :GO to continue, :POP to abort, :R for a list of available restarts. If continued: Retry getting the value of RESULTBUF. Type :? for other options.

2 >

comment:11 Changed 7 years ago by rme

Did you investigate the warning? The #_ reader macro needs to look stuff up in ccl:x86-headers64; in order to work. Your problem seems to be that the interface database isn't where ccl expects to find it.

What does (probe-file "ccl:") say? Does it look right? Are the interfaces in that directory?

Is the CCL_DEFAULT_DIRECTORY environment variable perhaps set to some strange place? Is there some strange place that it needs to be set to? It's typically not necessary to set it, but I have no idea if Heroku has requirements that prevent CCL from being installed there in the typical way.

I would guess that (external-call "gethostname" :address resultbuf :size_t 256) will work fine for you in your example. If so, that's a clear indication that the lack of interfaces is the problem.

comment:12 Changed 7 years ago by avodonosov

Yes, CCL_DEFAULT_DIRECTORY was set to a wrong value. After unsetting this variable completely, ccl works OK. Thanks for the help.

comment:13 Changed 7 years ago by rme

  • Resolution set to notabug
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.