MA3

From CLAB

Jump to: navigation, search

"The Microarray Project"

Contents

Overview

  • Lead developer: Sajan Singh Suwal
  • MA3 development instance (also "production" for now).
    • Development is occurring in different miscellaneous *2.php files and these tables: matrixinfo_test2, organisminfo_test2, probeinfo_test2
  • Currently open tickets in RT.
  • Source code: MA3 has it's own SVN repository.
  • MA3 Manual

PostgreSQL database

Tables

  • matrixinfo - 9.8M rows
    • One row is the brightness stats for a single X Y in a single experiment
    • record_id should be experiment_id
    • each row is small. That's good.
    • outliner should be outlier
  • organisminfo - 27 rows
    • Actually 1 row per experiment, so I would have called this the experiment table.
    • record_id should be experiment_id
  • probeinfo - 123K rows
    • target sequences badly need to be normalized out of here
    • probe_id should be probeset_id
    • record_id should be organism_id
psql -U schandio -d microarraydata
  \l               # List all databases
  \d               # List all objects in your current database
  \d matrixinfo    # Show the schema of the matrixinfo table

# Dump a schema. Takes 6 minutes! I have no idea why for 3 tables.
pg_dump -s -U schandio -W microarraydata

Maintenance

Routine Data Maintenance Tasks contains many recommendations. I'm running vacuum verbose analyze; now. I have no idea if anyone has ever done this. --Jhannah 11:13, 22 July 2007 (CDT)

That seems to have helped big time...? With using indexed scans and with response times to queries? --Jhannah 11:59, 2 August 2007 (CDT)

Query Tuning

explain [analyze] select * from matrixinfo;

Software

  • .fla files are Macromedia Flash source code.
  • .swf are the compiled Flash objects.
  • experimentList.php is the normal post
  • experimentListA.php is when you select all probes. It runs in the background and send its results to a file instead of the screen.

Misc SVN commands

svn checkout svn://klab.ist.unomaha.edu/KLAB/projects/transcriptomics/MArray MArray
svn stat
svn info
svn add
svn diff
svn commit

Progress on MA3 Project

RT103
The order of the probes in both the flash and the chart display were put in 
order of the start position of the probe and all the associated data 
correspond to each other on the display.
RT113
Sept 10 -
Server is being moved from kiran.homelinex.net to biobase.ist.unomaha.edu server which has
much higher bandwidth available.
new location:
http://biobase.ist.unomaha.edu/KLAB/projects/transcriptomics/MArray/

tasks:
-- back up current microarray database from kiran.homelinux.net
-- download the file to local machine.
-- upload to biobase 
-- install psql package for php in biobase
-- setup database and dump the file to the database
-- setup users and adjust permissions to the database
-- copy the source files to biobase
-- added the source files to the new svn server.


RT15
purpose:
enancement to the MA3 tool. Capability to upload .cel files generated from the lab and  
automate the process of adding its data to the MA3 database for multiple organisms.

Update:
Developing a model on how the entire process is going to work.
(.cel files are files containing data form an experiment for a given organism.)

1) the .cel (version 4) needs to be converted to .cel (version 3) using a windows program. 
the version 3 .cel file is plain text (sample) and version 4 .cell file is binary (sample).

2) the converted file is then parsed by a .pl script and the data is moved to the psql   
database.
3) Since the cel file converter is a windows binary application which requires user 
interaction, we are developing a windows application in C# which automates the interaction 
when a new cel file that needs to be converted is detected.
4)The dev upload site at: 
http://biobase.ist.unomaha.edu/KLAB/projects/transcriptomics/MArray/uploader/
5)The CelFileConversionEngine for the Windows server is (CelFileConversionEngine.exe)
  • can you please replace 'you will be redirected to status page in 7s' with the followoing: Please do not exit, you will be automatically redirected to the the "Status Page"
6) New updates :
----> The windows system and the linux system are connecting via ftp
----> The complete system has been successfully tested.
----> The complete cel file upload system takes 25-30 mins to complete.
----> If new organism is detected, the organism info update process takes 
.     about 10-20 mins.
----> Made modifications to the query page to be able to select organisms to query on.
----> Created new table organisms which keeps track of the organisms and its record_id.
7) The new organism handler is complete - takes 3 files for the organism to parse into the 
---probeinfo table.
---sample files :s_aureus_probe_tab(Probe table file) , s_aureus_target (Target Seq file), 
s_aureus_annot.csv(Probeinfo file)

Flow diagram for the cel file upload/conversion/update System

Image:system_diagram1.jpg

Entity Relation diagram of the microarray query system

Image:ER_diagram.jpg

Note: Currently the tables being used are matrixinfo_test2, organisminfo_test2, probeinfo_test2. 
(for testing and debugging purposes).
Personal tools