Introduction to Strace Sections 1. What is strace? 1a. What is a process? 1b. What is a system call? 1c. What information can strace capture? Not capture? 1d. What do I have to know before I can use strace? 1e. When am I not able to use strace? 1f. Resource considerations 2. Before you start to strace 2a. Ensuring strace is installed 2b. Finding the right process 3. Starting strace and basic options 3a. Strace a process start to finish 3b. Attaching to an already running process: -p PID 3c. Saving strace output to a file: -o filename 3d. Following forks: -f and -ff 4. Reading strace output 4a. File Handles 4b. File handles 1 and 2 4c. Manual pages (getting more info) 5. Example: permission issues 1. What is strace? Strace is a “system call trace” program. It attaches to a process and tracks system calls and signals made to and from it (and possibly it’s children). There are limitations to strace, some of which are detailed below. However, strace can be a very valuable tool for determining the root cause of many issues. This post only covers very basic usage of strace. The example used here can be done using jailshell. What is a process? For the purposes of this tutorial, a process is a unique instance of program being run, identified on the system by a unique process identifier (pid). For more information see Process definition by The Linux Information Project (LINFO) and Process identifier - Wikipedia, the free encyclopedia What is a system call? A system call is how a program requests that the operating system kernel perform a task for it. There are quite a few “normal” tasks that the operating system kernel does for programs regularly, including extremely common file operations like reading and writing. More information regarding system calls can be found at: System call - Wikipedia, the free encyclopedia What information can strace capture? Not capture? Strace can tell you • what files and network connections were opened, closed, or attempted • what was read from and written to from said files and connections Strace cannot tell you what is happening inside the process. It is not a debugger and is not aware of the variables used inside the process. What do I have to know before I can use strace? Strace is a power tool in your troubleshooting bag. But it’s not the most basic tool. General troubleshooting (checking log files, researching error messages, etc) should be done before attempting to solve a problem with strace. Before using strace, you’ll need to be able to reliably reproduce the issue so that you can catch it with strace. You’ll also need to determine what process to strace (covered later). When am I not able to use strace? In addition to there being times when strace will not be helpful, there are a few scenarios when you can not use strace. While most normal processes can be attached to by strace, there are some exceptions. You cannot attach to a process in these scenarios: • when a debugger or other tracing program is already attached • when you do not own the process, unless you are running the strace as root • when the system has special protections preventing strace from being used to gather information Resource considerations Output files created by strace files do get very big very quickly, but more importantly are the numerous writes that strace output produces. This i/o hit is not mitigated (and is frequently made worse) by outputting to the terminal instead of a file. 2. Before you start to strace Ensuring strace is installed You can determine if strace is installed by typing: Code: strace -V If strace is installed, you’ll see something like: strace -- version 4.5.19. If it’s not installed, you’ll see something like bash: strace: command not found. Install strace with: Code: yum install strace This command does require root access, so if you do not have root, you will need to request that your web hosting provider install strace for you. Finding the right process The ps command is very useful if you need to attach strace to a long running existing process (like a backup that you think may have stalled). The important column is the PID column. It will frequently be either the first or second column, depending on which options you use. What output you can see with the ps command is greatly dependent on your permissions for the server. A user with jailshell will frequently see very little: ￼The root user will see quite a bit more (part of the results from the command ps wwwfux): ￼ A jailshell user will not be able to see processes that it did not start in the terminal they are running ps from. This includes php processes and cPanel login sessions. A normal/bash shell user will be able to see all their processes - and processes running as other users. If you are attaching strace as the process starts, you will not need to use ps to determine the pid. If you only have jail shell access, you will be limited to using strace on processes you can start yourself (from the jail shell). 3. Starting strace and basic options There are two ways to attach a strace to a process: you can start the process with strace attached or you can attach strace to an already running process. Strace a process start to finish Staring a process with strace already attached means starting strace and having strace start the process. You tell strace you want it to start the process by adding the command you want to strace as the last argument of strace. This sounds complicated, but is easy. For example, if you wanted to strace the script ./chdir.sh start to finish, you would use this command: Code: # strace ./chdir.sh will print the strace output generated when running the shell script chdir.sh to the screen. Note that the command does run normally - any actions normally performed by the script being invoked will occur normally. Strace logs them, it does not stop them from occurring. Attaching to an already running process: -p PID If instead, you wish to attach to a long running process (like, say, a backup) you would use the similar command: Code: # strace [COLOR="#0000FF"]-p 12345[/COLOR] In this case, the blue text is the -p (pid) option followed by the target process’s process ID. To detach from a strace, use Ctrl-D. Saving strace output to a file: -o filename In most cases, you’ll want to save your strace output to a file. This is frequently faster than outputting to the terminal, and it gives you the opportunity to easily search the output using tools like grep. For example to save the output of the above command to a file called “debug.strace” in the current directory, you would use the command: Code: # strace -o debug.strace -p 12345 You can use any path that you have access to. The file does not need to already exist. If it does, strace will overwrite it. If you use a path you don’t have access to, you’ll get a Permission denied error and strace won’t attach: Code: # strace -f -o /root/perl-quiet2.strace ./open-silent.pl strace: can't fopen '/root/perl-quiet2.strace': Permission denied Following forks: -f and -ff Many processes create child processes to handle some of the work, and you will almost certainly need to tell strace to log them as well. This is recursive - if the child process creates a child, that “grandchild” will also be included in the strace. This is called “following forks”, and there are two ways to do so with strace. You can have strace merge all the output together with -f, or you can use -ff with -o and have strace write one file for each process (-ff). 4. Reading strace output Using strace to document the activity of a very simple script like this one: Code: #!/bin/bash cd /home/ produces a 6.3K file when using a invoking strace with this command: Code: strace -o changedir ./chdir.sh The script is functionally useless, but stracing it does produce some output for us to examine. Please do try this at home; it’s the best way to learn. When you do, you’ll see that the first line in the strace is: execve("./chdir.sh", ["./chdir.sh"], [/* 22 vars */]) = 0 You’ll see a line similar to this (with the execve call) when you start strace with a command. You will not see it if you attach strace to an already running process. The majority of the file contains information that we don’t care about as beginners, but there are a few items of interest. File Handles The second line of interest is: open("./chdir.sh", O_RDONLY) = 3 This is the operating system opening the script we’re running. It’s using the open system call to open the file ./chdir.sh in readonly mode. It’s has been assigned file handle #3. After it opens the file, it reads it. Strace does not give you the name of the file ever again - it only tells you what file handle it’s reading from: read(3, "#!/bin/bash\n\ncd /home/\n\n", 80) = 24 The above line tells you that the system read 24 bytes from file handle #3. It even helpfully tells you the content of those 24 bytes, because it’s short. Normally the entire amount read by the system will not be included in the strace - only the first small amount. To determine what that file is, you have to scroll up and see what the last file that was opened successfully with that file handle. Linux does reuse file handles, so it’s not uncommon to see something like: Code: open("./chdir.sh", O_RDONLY) = 3 read(3, "#!/bin/bash\n\ncd /home/\n\n", 80) = 24 close(3) open("./read.sh", O_RDONLY) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffbfda78c0) = -1 ENOTTY (Inappropriate ioctl for device) In this case, the error is not important - the kernel is simply telling the program that the file ./read.sh is not a terminal. You will see this frequently in strace output and can ignore it. The important detail to remember here is that file handles are re-used, and it’s important to read up from the line your interested in to find what file is currently being referenced by the file handle. File handles 1 and 2 File handles 1 and 2 are special. File handle 1 is standard input and file handle 2 is standard output. Standard error is another special file handle that, if used, frequently takes up file handle 3. A tutorial on standard input, output, and error can be found at: Standard input and output definitions by The Linux Information Project (LINFO) Manual pages (getting more info) The linux manual pages are divided into sections. Section 2 is for system calls, like the ones you will see in strace output. Each system call has it’s own manual page that includes all the things that are allowed to go as arguments (between the parentheses) and all the return values (things that can appear after the “=” sign). At this stage, the manual page is mostly useful for looking up things that appear in the strace output. For instance, an example used later will contain the following output: open("./chdir.sh", O_RDONLY) = 3 If you want to know what O_RDONLY means, you can either search the web for “man open linux” and use your browsers “find on page” function to find the information regarding O_RDONLY or you can use Code: man 2 open and (by default) the “/” key to search inside the manpage. The first instance of O_RDONLY on the page says: We now know that the file was opened read only. Lets say that instead of the above, output, we saw: open("./chdir.sh", O_RDONLY) = EACCES We then search on the manual page for EACCES and find: This is a little verbose, but it basically says that this is an access error - like permission denied. 5. Example: Permission Issues Permission issues are very common on linux servers, and can sometimes be hard to track down. Strace can be a helpful tool for many permissions issues. With cPanel, you’ll be using strace on a lot of perl scripts. It’s also likely that there will be multiple errors in one script. This example shows two permissions and a missing file issue. Create a file in your working directory called “open-warn.pl” and copy the contents below into it. Ensure that it is set to be executable (chmod +x open-warn.pl) Create a file called “test-read” that contains a short string, like “This is a test file”, and ensure that it has 0 permissions. Create a directory called “content”, and copy the test-read file into it. Ensure that the content folder has 0 permissions, and that the test-read file inside the content folder it is set to 644. Ensure that there is no file named “test-read-missing” in the working directory. Run strace on open-warn.pl, following forks and outputting to the file perl-test.strace The file for step 1: Code: [~/public_html]# cat open-warn.pl #!/usr/bin/perl print "Testing bad file permissions\n"; $filename = "test-read"; open( FILE, "< $filename" ) or warn "Can't open $filename : $!"; close FILE; print "Testing bad folder permissions\n"; $filename = "content/test-read"; open( FILE, "< $filename" ) or warn "Can't open $filename : $!"; close FILE; print "Testing missing file\n"; $filename = "test-read-missing"; open( FILE, "< $filename" ) or warn "Can't open $filename : $!"; close FILE; The “\n” at the end of each print statement tells perl to insert a newline. It also tells perl to flush the print buffer (print right now). Without them, all the error messages would print above all the print statements. (Go ahead and try it) The output from step 5: Code: [~/public_html]# strace -f -o perl-test.strace ./open-warn.pl Testing bad file permissions Can't open test-read : Permission denied at ./open-warn.pl line 5. Testing bad folder permissions Can't open content/test-read : Permission denied at ./open-warn.pl line 10. Testing missing file Can't open test-read-missing : No such file or directory at ./open-warn.pl line 15. [~/public_html]# So we have two “Permission Denied” and one “No such file” error. We can grep for either EACCES or “Permission denied” in the strace and find the errors quickly and easily: Code: [~/public_html]# grep EACCES ../perl-test.strace 8817 open("test-read", O_RDONLY) = -1 EACCES (Permission denied) 8817 open("content/test-read", O_RDONLY) = -1 EACCES (Permission denied) When we grep for ENOENT or “No such file”, however, we get quite a few more. Below I have two examples of pulling out some of the lines I know I don’t need. Unfortunately, determining what you can safely exclude is highly context dependent. Code: [~/public_html]# grep ENOENT ../perl-test.strace | grep -v "/usr/" 8817 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 8817 stat("/home/mary/perl5/lib/perl5/5.10.1/x86_64-linux-thread-multi", 0x7fff625b1400) = -1 ENOENT (No such file or directory) 8817 stat("/home/mary/perl5/lib/perl5/5.10.1", 0x7fff625b1400) = -1 ENOENT (No such file or directory) 8817 stat("/home/mary/perl5/lib/perl5/x86_64-linux-thread-multi", 0x7fff625b1400) = -1 ENOENT (No such file or directory) 8817 stat("/home/mary/perl5/lib/perl5/5.10.0", 0x7fff625b1400) = -1 ENOENT (No such file or directory) 8817 open("test-read-missing", O_RDONLY) = -1 ENOENT (No such file or directory) or Code: [~/public_html]# grep ENOENT ../perl-test.strace | grep -v "perl5" 8817 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 8817 open("test-read-missing", O_RDONLY) = -1 ENOENT (No such file or directory) Either way, it’s the last line that gives us what we’re looking for. Now, this was a little easy - we knew what to look for because the error message told us. Most of the time, you’ll be using strace because the error messages are not so helpful. Lets modify our open-warn.pl program and call it open-silent.pl: Code: [~/public_html]# cat open-silent.pl #!/usr/bin/perl print "Testing bad file permissions"; $filename = "test-read"; open( FILE, "< $filename" ) or warn "Can't open an important file"; close FILE; print "Testing bad folder permissions"; $filename = "content/test-read"; open( FILE, "< $filename" ) or warn "Can't open an important file"; close FILE; print "Testing missing file"; $filename = "test-read-missing"; open( FILE, "< $filename" ) or warn "Can't open an important file"; close FILE; Now when we run strace we get output like this: Code: [~/public_html]# strace -f -o perl-quiet.strace ./open-silent.pl Can't open an important file at ./open-silent.pl line 5. Can't open an important file at ./open-silent.pl line 10. Can't open an important file at ./open-silent.pl line 15. Testing bad file permissionsTesting bad folder permissionsTesting missing [~/public_html]# Now, it still gives you some hints - like the line number. But every single one of those lines reads: Code: open( FILE, "< $filename" ) or warn "Can't open an important file"; If the program was long and complicated, we’d have no way of knowing if it was the same file that was being attempted multiple times or if there were multiple files. Now the strace file tells us something useful. I would start by checking for permission errors, since they are the most common reason that a file can’t be opened: Code: [~/public_html]# grep EACCES perl-quiet.strace 1489 open("test-read", O_RDONLY) = -1 EACCES (Permission denied) 1489 open("content/test-read", O_RDONLY) = -1 EACCES (Permission denied) That gives us two of our three errors. Honestly, if I were debugging for real, at this point I would just check all the permissions on public_html and it’s subfolders and files. Because permission issues usually come in large batches on a webserver. Once I’ve fixed the permission issues on test-read and content, I can run strace again. Now I only have one error message. I’m going to check for permission issues again, because sometimes when you fix one permission issue (especially on a directory) it uncovers more permission issues. Unfortunately for me, there are no more permission issues: Code: [~/public_html]# grep EACCES perl-quiet2.strace [~/public_html]# So now I have to think of what else might cause a file not to be opened. Maybe the file just isn’t there. So I grep for ENOENT and man there’s a lot of output. I always start at the bottom - especially if the error stops the program from running to completion. In this case, the last line is: Code: 1595 open("test-read-missing", O_RDONLY) = -1 ENOENT (No such file or directory) Now I know that a program related file is missing. Depending on what my actual application is depends on how I fix this. In this case, I’m just going to touch the file to create it. Now when I run the script (./open-silent.pl) it does not output any error messages.