BioLinux

From CLAB

Jump to: navigation, search

Contents

Tutorial

The BioLinux tutorial.

Pre-Installation

To run BioLinux, we started with laptops running Windows 7. Before we installed BioLinux, we installed the following software (most of which is Open Source):

McAfee VirusScan

Putty

Adobe PDF reader

XMing

Mozilla

WinSCP

Altiris Tools


How To Run on the NEBC BioLinux Virtual Machine

The NERC Environmental BioInformatics Centre (NEBC) has created a virtual machine aimed specifically at BioInformatics, called Bio-Linux. It is an Ubuntu Linux 10.04 machine, running Z shell, that comes with over 500 BioInformatics programs pre-installed. Also, there are preconfigured databases that run Postgres and MySQL. It is easy to download, and get started.


1) Download. Install BioLinux as a virtual machine. At UNO, we added from a VMware environment.

2) Log in as root

3) Create a user account.

Go to the System menu in the top taskbar, and select: System --> Administration --> Users and Groups.

- OR - from the command line:

>adduser <username>

Pick a password, answer the questions asked by the system.

4) You must add <username> to the "ssh" group, in order for <username> to ssh to BioLinux.

>groups (Displays the current group information.)

>usermod -G <grp1>,<grp2>,ssh <username> (Existing groups have to be listed, or they will be dropped.)

then logout and login again, to make the ssh change.

5) The desktop has some excellent icons to help get started. There are two User Guides, a folder of Sample Data for your programs, and an icon to a search screen that will display the documentation for the installed programs.

The top User Guide has information and instructions for the administrator of the VM. It contains information on setting up user accounts, getting software updates for the Bio-Linux VM, system configuration, installing new software, and many other things.

The second User Guide is more of a User tutorial, and more detailed than this document. It explains the basic Linux commands, and then some of the installed BioInformatics programs. It also has a few exercises to learn to navigate the system. It is written for the Unix novice; experienced unix users can skip to "Part Two: Introduction to Bioinformatics on Bio-Linux".

The icon called "BioInformatics Docs" contains some documentation on all of the preinstalled BioInformatics programs in Bio-Linux.

6) The root password is locked in Ubuntu by default. You can issue root commands using "sudo".

7) The OpenOffice software suite, and editors "gedit", "nano", "pico", and "vi" are included.

8) The top Bio-Linux taskbar. From left to right the things you see in the top taskbar by default are:

a. Applications menu - the Bioinformatics sub-heading has links to many of the installed programs

b. Places menu - links to folders and installed hardware

c. System menu - to customize and administer your system

d. Firefox Web Browser

e. Ubuntu Help file

f. Evolution Mail (reading and sending emails)

g. Terminal (opens a terminal window)

You may or may not then find an exclamation mark in a bubble. If you do, it is the Package Manager alerting you that there are updates available for your system. Then, there might be a network symbol or up and down arrows – this means you have a network connection. Alternatively, you may see a little red exclamation mark. This means you have no network connection.

h. The Volume Control icon for audio

i. The envelope icon allows you to set up Chat or Mail on the system.

j. The System Clock - you can also click on this to open a calendar.

k. The networking menu – displays the username. Use this menu for social networking, including chat accounts, logging into Ubuntu One, etc.

l. The Power button – mousing over this brings up a menu with options to:

○ Lock screen

○ Switch user

○ Log out

○ Suspend

○ Hibernate

○ Restart, or

○ Shut down the computer


9) The bottom task-bar displays icons for open windows, and at the right hand side, links to four virtual desktops, and the Recycle bin.

10) Download the example files for the tutorial:

Click on the "Applications" menu on the top task bar. Open Applications -> Accessories -> Terminal

In the terminal window, enter the commands:

wget http://nebc.nerc.ac.uk/courses/Bio-Linux/bioinf_files.tar.gz

tar -xvfz bioinf_files.tar.gz


Note: the Bio-Linux VM comes with software already installed, but not with data. The example files here contain an example Blast DB, but it is quite small, and insufficient for actual research work.


Running BioInformatics Programs (EMBOSS)

EMBOSS is "The European Molecular Biology Open Software Suite", and it can be run graphically on Bio-Linux. (More information can be found in the documentation pages on Bio-Linux, or from the official EMBOSS overview, http://emboss.sourceforge.net/what/#Overview .)

1) Start Jemboss:

Click on the "Applications" menu on the top task bar. Open Applications -> Bioinformatics -> Jemboss

Click on each of the categories (eg., Alignment, Display, etc.) to see what programs are listed.

2) Then, click on "Feature Tables" and choose "coderet".

At the bottom right hand side of the window is an "i" button. Click on it to open the documentation window, and read what coderet does.

3) At the top of the Jemboss window, fill in a "Sequence Filename". (eg., embl:BX255937).

4) Fill in an output filename in the "output file name" box. Remember that it is important to give your files descriptive and distinctive names. Files will overwrite earlier files with the same name. (eg., jemboss_bx.coderet)

5) Hit the "GO" button at the bottom of the window.

6) When the program has finished, a new window called Saved Results should appear. (Don't be fooled – your results haven't been saved yet!) There should be a number of tabs in that window. One will be called the name you entered into the the outfile file name box (e.g., jemboss_bx.coderet). The others will likely be called things like bx255937.mrna, bx255937.noncoding, etc.

7) Take a look at the type of information in each tab. Notice that:

➢ each of the tabs that contains sequence information contains multiple sequences

➢ the command line you would use to run this program identically to how you just ran it via Jemboss is provided to you under the cmd tab. This will be useful later.

8) Save the data to a local file. Click on the tab with the name ending in .mrna. Under the "File" menu, choose "Save to Local File..." and save this to a location you can find again (e.g., under your bioinf_files directory). Give it a name that will distinguish it from later work (e.g., jemboss_bx.mrna). Do not close the "Saved Results" window as we want to refer to the information under the "cmd" tab later.

9) Go back to the main Jemboss window, and choose NUCLEIC -> REPEATS -> palindrome from the list of programs.

10) Next to the box under "Sequence Filename" (near the top of the page), there is a "Browse files..." button. Use that to find the file you just saved. Note that you'll have to change the "Files of Type:" option to "All Files" to find your saved file, because it has a .mrna suffix.

11) Check that you're happy with all the required options, and give a filename in the outfile file name box. (eg., jemboss_palin.txt). Then press the GO button.

12) Scan through the results to see what has been returned to you.



Running BioInformatics Programs (Blast)

Blast is "Basic Local Alignment Search Tool", and it can be run from the command line on Bio-Linux. It will run on locally installed Blast databases, which were downloaded in the first section (as part of http://nebc.nerc.ac.uk/courses/Bio-Linux/bioinf_files.tar.gz). (This is an alternative to the web-based interfaces that already exist, eg., http://blast.ncbi.nlm.nih.gov/Blast.cgi .)

Do a blast of "cd4_cerae.fasta" against the "sprot" database

   * Protein Search with a protein query (blastp) 
  #blast database is "sprot", in the blastdb subdirectory; input search file is "cd4_cerae.fasta"; output file is "cd4_cerae.blastp"
  blastall -p blastp -d blastdb/sprot -i cd4_cerae.fasta -o cd4_cerae.blastp
   * Protein Search with a nucleotide query (blastx) 
  #blast database is "sprot", in the blastdb subdirectory; input search file is "unknown.fasta"; output file is "unknown.blastx"
  blastall -p blastx -d blastdb/sprot -i unknown.fasta -o unknown.blastx

Remote Access

The BioLinux machine can be accesses remotely through ssh or NX.

SSH provides command line access to BioLinux, just as it does to any other Linux box. From a remote machine, enter the command:

>ssh <username>@<ip address>

NX access is provided in order to enable full graphical desktop access. In order to use it, you must first install the NX client on your remote machine. It can be downloaded from NoMachine, among others. (The NX client is already installed on the BioLinux VM.) Then, launch the NX client, and click on "Configure". Type in the address of the BioLinux machine as the "Host", and change the desktop type to "Gnome". Then login with your username and password.


References

Background on Ubuntu

Dr. Mark Pauley's Unix slides

Linux Basics

VI cheat sheet

JCVI Cloud BioLinux

The J. Craig Venter Institute (JCVI) has written a publicly available virtual machine that runs on cloud computing platforms, JCVI's Cloud BioLinux, based on the NEBC BioLinux machine. The image is stored, and available for copying free of charge, at Amazon's EC2 computer cluster. More information can be found here.

Adding User to use NXclient

Become root and run the following commands:

  • Add user to the box
adduser userX
  • Add the user to ssh group
vi /etc/group
Add username in ssh: line
ssh:x:102:userA,userB,userX
  • Add the user for nx-server
nxserver --adduser userX
  • Open the key file and copy the content of the file
cat /var/lib/nxserver/home/.ssh/client.id_dsa.key
  • Open the terminal NX-client, Click on Configure button and Press on Key:

File:Nxclient-key.png

Paste the key content from previous step and click on save
  • Put in your username and password to log into Bio-linux box
Personal tools