Linux for Biologists

linux-hosting

LInux is an Operating System (OS) like Windows, which runs on many Computers. Orchestrates the various parts of the computer: the processor, the on-board memory, the disk drives, keyboards, video monitors, etc. to perform useful tasks

Unix operating system comprises three parts:

  • kernel
  • standard utility programs
  • system configuration files.

Unix is case sensitive. –“ls” command is not equal to “Ls”

Why Linux?

 Majority of bioinformatics/computational biology software developed only for Linux  Most programs are command-line (i.e., launched by entering a command in a terminal window rather than through GUI)  While various graphical and/or web user interfaces exist (e.g., Galaxy, iPlant Discovery Environment, BioHPC Web), but often struggle to provide level of flexibility needed in cutting-edge research  Versatile scripting and system tools readily available on Linux allow customization of any analysis

Working on a Linux computer

Logging in to a Unix system requires the following information  information

  • Username
  • Password
  • Server you logging in to

ssh -X <your_username>@<host_name>

You can connect to a linux computer on your laptop: remote access software (typically: ssh client, VNC client)

  • ssh: Secure Shell – provides access to alphanumeric terminal
  • VNC: Virtual Network Connection – provides access to graphical features (Desktop, GUIs, File Manager, Firefox, …)

Linux is a multi-access, multi-tasking system: multiple users may be logged in and run multiple tasks on one machine at the same time, sharing resources (CPUs, memory, disk space)

Logging in from Windows PC

Install remote access software (PuTTy).

Use PuTTy to open a terminal window on the reserved workstation using ssh protocol, configure X11 forwarding (if you intend to run graphical software) When connecting for the first time, a window will pop out about “caching server hostkey” – answer “Yes”. The window will not appear next time around Adjust colors, if desired Save the configuration (e.g., under the machine’s name) while you are typing your password, the terminal will appear frozen – this is on purpose! You may open several terminal windows, if needed (in PuTTy – can use “Duplicate Session” function).

Logging in from Mac (or other Linux box)

Use native ssh client (already there – no need to install anything) Launch the Mac’s terminal window and type ssh -Y bukowski@cbsuwrkstX.tc.cornell.edu

Copying files from local 

To copy files To the server run the following on your workstation or laptop:

scp -r <path_to_directory> <your_username>@<host_name>

To copy files From the server run the following on your workstation or laptop:

scp -r <your_username>@<host_name>:<path_to_directory>

Logging out of a Linux machine While in terminal window, type exit or Ctrl-d – this will close the current terminal window

Interacting with Linux in terminal window (Command line interface to the OS)

User communicates with Linux machine via commands typed in the terminal window Commands are interpreted by a program referred to as shell – an interface between Linux and the user. We will be using the shell called bash (another popular shell is tcsh). Typically, each command is typed in one line and “entered” by hitting the Enter key on the keyboard.

Basic form of a Unix command is:

command [-options] [arguments]

Example:

ls -l /tmp

  • Here:  “ls” is the command
  • “-l” is the options
  • “/tmp” is the argument

pwd – shows the present working direcoty

Aborting a shell command –

–most Unix systems allow to abort the current command by typing Control-C

Getting help in linux

  • Use the man command, followed by the name of the command you need help with.
  • Type ‘man ls’ – to see the manual page for the “ls” command.

Orientation

Viewing and changing the present working directory:

pwd               # Get full path of the present working directory (same as “echo $HOME”)

ls                # Content of pwd
ls -l             # Similar as ls, but provides additional info on files and directories
ls -a             # Includes hidden files (.name) as well
ls -R             # Lists subdirectories recursively
ls -t             # Lists files in chronological order

cd <dir_name>     # Switches into specified directory.
cd                # Brings you to the highest level of your home directory.
cd ..             # Moves one directory up
cd ../../         # Moves two directories up (and so on)
cd -                            # Go back to you were previously (before the last directory change)

The tilde symbol (~) gets interpreted as the path to your home directory. This will happen anywhere on the command line:

echo ~            # View the full (complete) path of your home
find ~            # List all your files (including everything in sub-directories)
ls ~              # List the top level files of your home directory
du -sch ~/*       # Calculate the file sizes in your home

Working with Files and Directories

Creating directories

mkdir <dir_name>   # Creates specified directory
rmdir <dir_name>   # Removes empty directory

Removing directories

rm <file_name>     # Removes file name
rm -r <dir_name>   # Removes directory including its content, but asks for confirmation,
# 'f' argument turns confirmation off

Copying or moving

cp <name> <path>   # Copy file/directory as specified in path (-r to include content in directories)
mv <name1> <name2> # Renames directories or files
mv <name> <path>   # Moves file/directory as specified in path

Finding files, directories and applications

find -name "*pattern*"            # searches for *pattern* in and below current directory
find /usr/local -name "*blast*"   # finds file names *blast* in specfied directory
find /usr/local -iname "*blast*"  # same as above, but case insensitive

additional useful arguments: -user <user name>, -group <group name>, -ctime <number of days ago changed>

find ~ -type f -mtime -2   # finds all files you have modified in the last two days
locate <pattern>           # finds files and dirs that are written into update file
which <application_name>   # location of application
whereis <application_name> # searches for executeables in set of directories
dpkg -l | grep mypattern   # find Debian packages and refine search with grep pattern

Finding things in files

grep pattern file           # provides lines in 'file' where pattern 'appears',
# if pattern is shell function use single-quotes: '>'

grep -H pattern             # -H prints out file name in front of pattern
grep 'pattern' file | wc    # pipes lines with pattern into word count wc (see chapter 8)
# wc arguments: -c: show only bytes, -w: show only words,
# -l: show only lines; help on regular expressions:
# $ man 7 regex or man perlre

find /home/my_dir -name '*.txt' | xargs grep -c ^.*  # counts line numbers on many
# files and records each count along with individual file
# name; find and xargs are used to circumvent the Linux
# wildcard limit to apply this function on thousands of files

Permissions and Ownership

Assign write and execute permissions to user and group

chmod ug+rx my_file

Change ownership

chown <user> <file or dir>         # changes user ownership
chgrp <group> <file or dir>        # changes group ownership
chown <user>:<group> <file or dir> # changes user & group ownership

Useful Unix Commands

df          # disk space
free -g     # memory info in Megabytes
uname -a    # shows tech info about machine
bc          # command-line calculator (to exit type 'quit')
wget ftp://ftp.ncbi.nih.... # file download from web
/sbin/ifconfig # give IP and other network info
ln -s original_filename new_filename # creates symbolic link to file or directory
du -sh      # displays disk space usage of current directory
du -sh *    # displays disk space usage of individual files/directories
du -s * | sort -nr # shows disk space used by different directories/files sorted by size

cat file1 file2 cat.out     #concatenate files in output file 'cat.out'
paste file1 file2 paste.out     #merges lines of files and separates them by tabs (useful for tables)
cmp file1 file2     #tells you whether two files are identical
diff fileA fileB      #finds differences between two files
head -number file     #prints first lines of a file
tail -number file     #prints last lines of a file
split -l number file     #splits lines of file into many smaller ones
csplit -f out fasta_batch "%^%" "/^/" "{*}"     #splits fasta batch file into many files at ''
sort file     #sorts single file, many files and can merge (-m) them, -b ignores leading white space
sort -k 2,2 -k 3,3n input_file output_file     #sorts in table col 2 alphabetically and col 3 numerically, '-k' for column, '-n' for numericv
sort input_file | uniq output_file     #uniq command removes duplicates and creates file/table with unique lines/fields
join -1 1 -2 1 table1 table2     #joins two tables based on specified column numbers
# (-1 file1, 1: col1; -2: file2, col2). It assumes that join fields are sorted. If that is not the case, use the next command:

sort table1 table1a; sort table2 table2a; join -a 1 -t "`echo -e '\t'`" table1a table2a table3 # '-a table' prints all lines of specified table!
# Default prints only all lines the two tables have in
# common. '-t "`echo -e '\t'`" -' forces join to
# use tabs as field separator in its output. Default is
# space(s)!!!

cat my_table | cut -d , -f1-3      #cut command prints only specified sections of a table,
# -d specifies here comma as column separator (tab is
# default), -f specifies column numbers.

Process Management – general

top               # view top consumers of memory and CPU (press 1 to see per-CPU statistics)
who               # Shows who is logged into system
w                 # Shows which users are logged into system and what they are doing
ps                # Shows processes running by user
ps -e             # Shows all processes on system; try also '-a' and '-x' arguments
ps aux | grep <user_name> # Shows all processes of one user
ps ax --tree      # Shows the child-parent hierarchy of all processes
ps -o %t -p <pid> # Shows how long a particular process was running.
# (E.g. 6-04:30:50 means 6 days 4 hours ...)

Ctrl z <enter>    # Suspend (put to sleep) a process
fg                # Resume (wake up) a suspended process and brings it into foreground
bg                # Resume (wake up) a suspended process but keeps it running
# in the background
.
Ctrl c            # Kills the process that is currently running in the foreground
kill <process-ID> # Kills a specific process
kill -9 <process-ID> # NOTICE: "kill -9" is a very violent approach.
# It does not give the process any time to perform cleanup procedures
.
kill -l           # List all of the signals that can be sent to a proccess
kill -s SIGSTOP <process-ID> # Suspend (put to sleep) a specific process
kill -s SIGCONT <process-ID> # Resume (wake up) a specific process
renice -n <priority_value> # Changes the priority value, which range from 1-19,
# the higher the value the lower the priority, default is 10

Redirecting Input and Output

Every program you run from the shell opens three files:

  • Standard input: Using the “less-than” sign with a file name like  “<filename”
  • Standard output: Using the “greater-than” sign with a file name like  ‘>filename’
  • Standard error: A bit trickier, depending on the kind of shell being used

Pipelines and Filters

Pipe

  • Allows  to connect processes, by letting the standard output of one process feed into the standard input of another process.
  • ls -l /etc | more

Grep

  • Grep searches line-by-line for a specified pattern, and outputs any line that matches the pattern.
  • basic syntax for the grep command is
  • grep [-options] pattern [file].

Text Editors

Vi and Vim

Non-graphical (terminal-based) editor. Vi is guaranteed to be available on any system. Vim is the improved version of vi.

Vim Manual

vim my_file_name # open/create file with vim

Once you are in Vim the most important commands are i:  and ESC. The i key brings you into the insert mode for typing. The ESC brings you out of there. And the : key starts the command mode at the bottom of the screen. In the following text, all commands starting with : need to be typed in the command mode. All other commands are typed in the normal mode after hitting the ESC key.

Modifier Keys to Control Vim
  • i # INSERT MODE
  • ESC # NORMAL (NON-EDITING) MODE
  • : # commands start with ':'
  • :w # save command; if you are in editing mode you have to hit ESC first!!
  • :q # quit file, don't save
  • :q! # exits WITHOUT saving any changes you have made
  • :wq # save and quit
  • R # replace MODE
  • r # replace only one character under cursor
  • q: # history of commands (from NORMAL MODE!), to reexecute one of them, select and hit enter!
  • :w new_filename # saves into new file
  • :#,#w new_filename # saves specific lines (#,#) to new file
  • :# go to specified line number

Moving Around in Files

  • $ # moves cursor to end of line
  • A # same as $, but switches to insert mode
  • 0 (zero) # moves cursor to beginning of line
  • CTRL-g # shows at status line filename and the line you are on
  • SHIFT-G # brings you to bottom of file, type line number (isn't displayed) then SHIFT-G # brings you to specified line#

Search in Files

  • /my_pattern # searches for my_pattern downwards, type n for next match
  • ?my_pattern # seraches for my_pattern upwards, type n for next match
  • :set ic # switches to ignore case search (case insensitive)
  • :set hls # switches to highlight search (highlights search hits)

Screen

Starting a New Screen Session

screen                 # Start a new session
screen -S <some-name>  # Start a new session and gives it a name
Commands to Control Screen
  • Ctrl-a d #  Detach from the screen session
  • Ctrl-a c # Create a new window inside the screen session
  • Ctrl-a Space # Switch to the next window
  • Ctrl-a a # Switch to the window that you were previously on
  • Ctrl-a " # List all open windows. Double-quotes " are typed with the Shift key
  • Ctrl-d or type exit # Exit out of the current window. Exiting form the last window will end the screen session
  • Ctrl-a [ # Enters the scrolling mode. Use Page Up and Page Down keys to scroll through the window. Hit the Enter key twice to return to normal mode.

Attaching to Screen Sessions

From any computer, you can attach to a screen session after SSH-ing into a server.

screen -r              # Attaches to an existing session, if there is only one
screen -r              # Lists available sessions and their names, if there are more then one session running
screen -r <some-name>  # Attaches to a specific session
screen -r <first-few-letters-of-name> # Type just the first few letters of the name
# and you will be attached to the session you need

Destroying Screen Sessions

1. Terminate all programs that are running in the screen session. The standard way to do that is:

Ctrl-c

Archiving and Compressing

Creating Archives

tar -cvf my_file.tar mydir/    # Builds tar archive of files or directories. For directories, execute command in parent directory. Don't use absolute path.    
tar -czvf my_file.tgz mydir/   # Builds tar archive with compression of files or directories. For
# directories, execute command in parent directory. Don't use absolute path.
zip -r mydir.zip mydir/        # Command to archive a directory (here mydir) with zip.
tar -jcvf mydir.tar.bz2 mydir/ # Creates *.tar.bz2 archive

Viewing Archives

tar -tvf my_file.tar
tar -tzvf my_file.tgz

Extracting Archives

tar -xvf my_file.tar
tar -xzvf my_file.tgz
gunzip my_file.tar.gz # or unzip my_file.zip, uncompress my_file.Z,
# or bunzip2 for file.tar.bz2

find -name '*.zip' | xargs -n 1 unzip # this command usually works for unzipping
# many files that were compressed under Windows

tar -jxvf mydir.tar.bz2 # Extracts *.tar.bz2 archive