ALEXA logo and images of BC Cancer Agency
Button link to homepage of the ALEXA-Seq website Button link to an introduction to the ALEXA-Seq website Button link to detailed methods for how ALEXA-Seq is used to study alternative transcript expression Button link to results for a validation experiment using the ALEXA-Seq approach Button link to the download area for supplementary data and source code

Source code

Current version: 1.17

User Manual: ALEXA-Seq_Manual_v.1.17.pdf

Linux Installation Manual: ALEXA-Seq_LinuxInstall_v.1.17.pdf

Change Log: CHANGELOG.txt


Tarball

Download directly from here: ALEXA_Seq_v.1.17.tar.gz


SourceForge

You can also obtain the current version (as well as old versions) from SourceForge.net: ALEXA-Seq at SourceForge


Subversion

If you have svn installed, you can checkout the code directly from our subversion repository. For example:

cd /home/user/

svn checkout https://svn.bcgsc.ca/public/ALEXA_Seq/tags/ALEXA_Seq_v.1.17/ alexa_seq


Virtual Machines

Finally, you can download the code installed into a VMware Virtual Machine (VM). These VMs include the source code and all dependencies and tools needed for the analysis. The VM can be booted from a Mac, Windows or Linux desktop. It will run as a self-contained 'guest' operating system within a window on your 'host' system. To use the VM you will need to download and install a VMware player. There are several free options, or you can buy VMWare workstation (for Windows/Linux) or VMware Fusion (for Mac). Using the player, you can configure the number of CPUs and amount of memory used by the Virtual Machine. In addition to the player, the VM will run on the following VMware platforms: ACE, ESX, Fusion, Server, and Workstation. Furthermore, these VMs are compatible with public and private 'Cloud' computing services: vCloud, vSphere, vCenter. Our VMs were created and tested with VMware Workstation 7.1 on a Windows XP system with 2 CPUs and 4 Gb of memory (2.5 Gb available to the VM). They can be configured to use up to 8 CPUs/Cores, 32 Gb of memory, and 2 Tb of disk space from the host system.


Virtual Machine Players (links to external websites open in a new window or tab):

VMware Player (free): For Windows | For Linux

VMware Fusion (free 30 day trial): For Mac

VMware Workstation (free 30 day trial): For Windows or Linux


ALEXA-Seq Virtual Machine Downloads:

For 32-bit Host: ALEXA-Seq 32-bit v.1.12b.tar.gz (1.8 Gb) | ALEXA-Seq 32-bit v.1.12b.zip (1.8 Gb)

For 64-bit Host: ALEXA-Seq 64-bit v.1.12b.tar.gz (1.9 Gb) | ALEXA-Seq 64-bit v.1.12b.zip (1.9 Gb)

Note: This latest version of the VM (v.1.12) incorporates many performance improvements and a more simplified process for running the demonstration analysis. It has been tested successfully from start to finish on a Windows XP machine with 2Gb of memory allocated to the VM.

Note: You must select the version that matches your CPU(s). Unless your computer is more than 2-3 years old it is very likely to be 64-bit. If you select the wrong version it may fail to boot properly.


For a brief visual introduction to the VM, check out some screenshots or the slideshow


Step by step instructions:

1.) Download and install one of the player options described above (the following assumes VMware player)

2.) Download the ALEXA-Seq virtual machine. Pick the 32-bit or 64-bit to match your system

3.) Unpack the archive. Use 'tar -zxvf ArchiveName.tar.gz' for Linux or Mac and 7-Zip for Windows.

4.) Start VMware player and select 'Open Virtual Machine' from the 'File' menu.

5.) Browse to where you unpacked the ALEXA-Seq VM, select 'ALEXA-Seq XX-bit.vmx', and select 'Open'.

6.) Select 'Edit virtual machine settings' to modify the number of CPUs and amount of memory used.

7.) When ready, select 'Play virtual machine'. The system will now boot inside a window.

8.) To toggle full screen mode press: 'Ctrl+Alt+Enter'.

9.) See the user manual for username and password: ALEXA-Seq_Manual_v.1.17.pdf

10.) If the screen resolution does not seem correct, adjust it: System -> Preferences -> Screen Resolution

11.) Open the file 'DEMO.txt' on the desktop for a demonstration.


Virtual Machine Demonstration and System Requirements:

The demonstration described in DEMO.txt involves analyzing a test dataset of several million paired reads (~4 million 2x42-mers) corresponding to MIP101 and MIP/5FU human colorectal cancer cell lines. The purpose of this demostration is to illustrate the analysis steps on a small dataset that can be analyzed quickly on a modest system. The analysis requires alignment to human reference genome sequences and involves the creation of large temporary files. Your system should have 20-30 Gb of free disk space. Since the VM is self-contained, when you are done testing, simply delete the directory containing the VM to free this disk space. We tested the VM demonstration on several Windows and Mac computers. Using 2 CPUs and 2 Gb of memory the demonstration analysis should complete is less than 24 hours.


Database Schema

To assist in the identification of alternative expression events by massively parallel RNA sequence data, we developed the 'ALEXA-Seq' annotation database. Briefly, this database defines expression ‘features’ that can be informative of alternative expression events such as exon skipping, alternative exon boundary usage, inclusion of cryptic exons, intron retention, etc. For the human genome (build 36 / hg18), a total of ~3.8 million such features were defined. Each feature was annotated with information describing its size, repeat content, protein coding content, mRNA/EST sequence support, cross-species conservation (by examining EST/mRNA alignments from other species), etc. and assigned a descriptive feature name. A detailed description of the values defined for each feature can be downloaded: ALEXA Seq Schema Description. For further details refer to Griffith et al.. Download a database for your species of interest or request a custom database below.


ALEXA-Seq annotation databases available for download


ALEXA_dm_54_54b (Fly)

Fly image Species: Drosophila melanogaster
Common name: Fruit Fly
Source EnsEMBL Database: drosophila_melanogaster_core_54_54b
Total features: 534,061 (23% known, 77% predicted, 20% EST/mRNA supported, 1% conserved)
Total sequence bases: 134 million (80% unmasked, 17% coding)
Feature statistics: Stats_dm_54_54b.xls
Download annotation database (146 Mb): dm_54_54b.tar.gz


ALEXA_dr_57_8c (ZebraFish)

Zebrafish image Species: Danio rerio
Common name: Zebrafish
Source EnsEMBL Database: danio_rerio_core_57_8c
Total features: 2,827,336 (14% known, 84% predicted, 10% EST/mRNA supported, 5% conserved)
Total sequence bases: 1.4 billion (47% unmasked, 2.5% coding)
Feature statistics: Stats_dr_57_8c.xls
Download annotation database (1.1 Gb): dr_57_8c.tar.gz


ALEXA_gg_54_2l (Chicken)

Chicken image Species: Gallus gallus
Common name: Chicken
Source EnsEMBL Database: gallus_gallus_core_54_2l
Total features: 2,392,504 (14% known, 86% predicted, 9% EST/mRNA supported, 6% conserved)
Total sequence bases: 1.1 billion (84% unmasked, 2.3% coding)
Feature statistics: Stats_gg_54_2l.xls
Download ALEXA-Seq annotation database (871 Mb): gg_54_2l.tar.gz


ALEXA_hs_53_36o (Human - build36/hg18)

Human image Species: Homo sapiens
Common name: Human
Source EnsEMBL Database: homo_sapiens_core_53_36o
Total features: 3,814,043 (14% known, 86% predicted, 16% EST/mRNA supported, 9% conserved)
Total sequence bases: 3.0 billion (47% unmasked, 1.2% coding)
Feature statistics: Stats_hs_53_36o.xls
Download annotation database (2.1 Gb): hs_53_36o.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_hs_55_37 (Human - build37/hg19)

Human image Species: Homo sapiens
Common name: Human
Source EnsEMBL Database: homo_sapiens_core_55_37
Total features: 4,639,589 (15% known, 85% predicted, 16% EST/mRNA supported, 8% conserved)
Total sequence bases: 4.3 billion (47% unmasked, 0.9% coding)
Feature statistics: Stats_hs_55_37.xls
Download annotation database (2.8 Gb): hs_55_37.tar.gz


ALEXA_hs_57_37b (Human - build37/hg19)

Human image Species: Homo sapiens
Common name: Human
Source EnsEMBL Database: homo_sapiens_core_57_37b
Total features: 5,270,148 (14% known, 86% predicted, 16% EST/mRNA supported, 9% conserved)
Total sequence bases: 3.0 billion (46% unmasked, 1.2% coding)
Feature statistics: Stats_hs_57_37b.xls
Download annotation database (2.3 Gb): hs_57_37b.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_hs_60_37e (Human - build37/hg19)

Human image Species: Homo sapiens
Common name: Human
Source EnsEMBL Database: homo_sapiens_core_60_37e
Total features: 5,692,563 (14% known, 86% predicted, 15% EST/mRNA supported, 8% conserved)
Total sequence bases: 3.0 billion (46% unmasked, 1.2% coding)
Feature statistics: Stats_hs_60_37e.xls
Download annotation database (2.3 Gb): hs_60_37e.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_hs_65_37j (Human - build37/hg19)

Human image Species: Homo sapiens
Common name: Human
Source EnsEMBL Database: homo_sapiens_core_65_37j
Total features: 6,320,728 (14% known, 86% predicted, 15% EST/mRNA supported, 9% conserved)
Total sequence bases: 3.1 billion (46% unmasked, 1.2% coding)
Feature statistics: Stats_hs_65_37j.xls
Download annotation database (6.9 Gb): hs_65_37j.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_mm_54_37g (Mouse)

Mouse image Species: Mus musculus
Common name: Mouse
Source EnsEMBL Database: mus_musculus_core_54_37g
Total features: 3,313,018 (14% known, 86% predicted, 17% EST/mRNA supported, 7.6% conserved)
Total sequence bases: 2.6 billion (55% unmasked, 1.4% coding)
Feature statistics: Stats_mm_54_37g.xls
Download annotation database (1.9 Gb): mm_54_37g.tar.gz


ALEXA_mm_60_37m (Mouse)

Mouse image Species: Mus musculus
Common name: Mouse
Source EnsEMBL Database: mus_musculus_core_60_37m
Total features: 4,076,317 (14% known, 86% predicted, 16% EST/mRNA supported, 7.5% conserved)
Total sequence bases: 2.6 billion (55% unmasked, 1.4% coding)
Feature statistics: Stats_mm_60_37m.xls
Download annotation database (1.9 Gb): mm_60_37m.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_mm_65_37r (Mouse)

Mouse image Species: Mus musculus
Common name: Mouse
Source EnsEMBL Database: mus_musculus_core_65_37r
Total features: 4,076,317 (14% known, 86% predicted, 16% EST/mRNA supported, 7.5% conserved)
Total sequence bases: 2.6 billion (55% unmasked, 1.4% coding)
Feature statistics: Stats_mm_65_37r.xls
Download annotation database (1.9 Gb): mm_65_37r.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries


ALEXA_pt_54_21k (Chimp)

Chimp image Species: Pan troglodytes
Common name: Chimp
Source EnsEMBL Database: pan_troglodytes_core_54_21k
Total features: 3,141,903 (14% known, 86% predicted, 15% EST/mRNA supported, 8% conserved)
Total sequence bases: 3.1 billion (46% unmasked, 1.0% coding)
Statistics: Stats_pt_54_21k.xls
Download annotation database (2.0 Gb): pt_54_21k.tar.gz


ALEXA_rn_54_34v (Rat)

Rat image Species: Rattus norvegicus
Common name: Rat
Source EnsEMBL Database: rattus_norvegicus_core_54_34v
Total features: 3,203,902 (14% known, 86% predicted, 10% EST/mRNA supported, 10% conserved)
Total sequence bases: 2.7 billion (51% unmasked, 1.3% coding)
Feature statistics: Stats_rn_54_34v.xls
Download annotation database (1.8 Gb): rn_54_34v.tar.gz


ALEXA_sc_54_li (Yeast)

Yeast image Species: Saccharomyces cerevisiae
Common name: Baker's yeast
Source EnsEMBL Database: saccharomyces_cerevisiae_core_54_li
Total features: 29,361 (26% known, 74% predicted, 0% EST/mRNA supported, 0% conserved)
Total sequence bases: 12.3 million (97% unmasked, 73% coding)
Feature statistics: Stats_sc_54_li.xls
Download annotation database (15 Mb): sc_54_li.tar.gz
Download additional junction DBs (60mer-150mer): exonJunctions | exonBoundaries



Request an additional species or other custom annotation database

If you are interested in a particular species that is not included in the list above, you can download the source code and create it yourself. Or you can request the we generate it for your by contacting us. By default we create the exon-exon junction sequences with a length of 62 bases (suitable for 36 mer or greater reads). Before making a request, please first read the database schema description above to determine if the database would be useful for your analysis. Also, we are limited to species annotated by EnsEMBL (current list of EnsEMBL species: here).


Acknowledgements

Each species image above links directly to EnsEMBL. These images were obtained from www.ensembl.org





Button link to main ALEXA home page Button link to ALEXA-Arrays home page Button link to ALEXA-Seq home page Button link to acknowledgements of funding and other support for Malachi Griffith and Marco Marra Button link to contact information for Malachi Griffith and Marco Marra