11 January 2013
Rene L. Warren

-A summary of the TCR/BCR sequence analysis can be found in the tcr and bcr directories, respectively

-Data files have been placed in subdirectories whose name match patient samples.  For sample breakdown/information, please refer to samples.tsv

-In each,there are two files:

1)trackContigsBCRCANDIDATESwithJ.csv   (or trackContigsTCRCANDIDATESwithJ.csv
for TCR analyses)
2)insert_contigBCRCANDIDATESwithJ.sql  (or insert_contigBCRCANDIDATESwithJ.sql
for TCR analyses)

The first file, trackContigs*CRCANDIDATESwithJ.csv is a comma separated file
(csv) that summarizes TCR or BCR annotation for EACH INDIVIDUAL AMPLICON.
Hence, each record (line) represents a unique BCR/TCR amplicon sequence.
(Amplicon sequence determined by Illumina MiSeq sequencing and paired-reads
co-assembled).

The second file insert_contig*CRCANDIDATESwithJ.sql is a structured query
language (sql) file whose format is somewhat similar to csv.  It is a higher-level
analysis of information found in the previous (trackContigs*.csv) file and each line of that file
represents a unique rearrangement.  I have not filtered out any information,
so out-of-frame, low depth and ambiguous rearrangements are listed in the
file.  See below on how to filter your dataset.

The TCR/BCR clonotype data is organized in MySQL (.sql) format 
Each line correspond to a unique, unfiltered clonotype


Each column of the sql file consists of:
+---------------+-----------------------+------+-----+---------+----------------+
| Field         | Type                  | Null | Key | Default | Extra          |
+---------------+-----------------------+------+-----+---------+----------------+
| id            | int(10) unsigned      |      | PRI | NULL    | auto_increment |
| FK_run__id    | int(10) unsigned      |      | MUL | 0       |                |
| ntSeq         | varchar(200)          | YES  | MUL | NULL    |                |
| depth         | mediumint(8) unsigned |      |     | 0       |                |
| rearrangement | varchar(200)          | YES  | MUL | NULL    |                |
| ntCDR3        | varchar(100)          |      |     |         |                |
| aaCDR3        | varchar(100)          |      |     |         |                |
| ntSeqShort    | varchar(200)          | YES  |     | NULL    |                |
| aaSeqShort    | varchar(200)          | YES  |     | NULL    |                |
| ntSeqLong     | varchar(250)          | YES  |     | NULL    |                |
| aaSeqLong     | varchar(250)          | YES  |     | NULL    |                |
| vName         | varchar(15)           | YES  |     | NULL    |                |
| vDeleted      | smallint(5) unsigned  | YES  |     | NULL    |                |
| vEnd          | smallint(5) unsigned  | YES  |     | NULL    |                |
| vFrame        | smallint(5) unsigned  | YES  |     | NULL    |                |
| jName         | varchar(15)           | YES  |     | NULL    |                |
| jDeleted      | smallint(5) unsigned  | YES  |     | NULL    |                |
| jRestSeq      | varchar(200)          | YES  |     | NULL    |                |
| frameCheck    | smallint(5) unsigned  |      |     | 0       |                |
| vPossible     | mediumint(8) unsigned | YES  |     | NULL    |                |
| ntTCRB        | text                  |      |     |         |                |
| aaTCRB        | text                  |      |     |         |                |
+---------------+-----------------------+------+-----+---------+----------------+

The first field in the .sql file == column 0

vName with the pipe separator "|" indicates that more than one possible V could be assigned, because the amplicon did not capture V unambiguously. This is reflected in the "rearrangement" field as well. For unambiguous V assignments, make sure you search for vPossible==1 (column 19)

If you are interested only in in-frame rearrangements, look for frameCheck=1 (column 18).  

If interested in rearrangements having a depth of 2 or more (depth representing the number of amplicons that harbour the same rearrangement/clonotype), set depth >= 2 (column 3)