Changes to cluster-1.26 commandline version Original code provided by: http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ M.J.L. de Hoon, S. Imoto, J. Nolan, and S. Miyano: Open Source Clustering Software. Bioinformatics, 20 (9): 1453--1454 (2004). The following changes were implemented to retrieve actual Pearson values from C commandline version of cluster software. These will only work if the following options are used: cluster -f myfile.txt -g 2 -e 0 -m n -u myoutput.txt Pearsons are printed to STDOUT and can be captured with a perl parser. Modifications were made to cluster-1.26 and are not guaranteed to work with later versions. Note: Apparently the Python implementation already has this functionality. All source code is found in src/ directory of cluster installation directory. I) in command.c: Added 'n' option to if condition and print statement as below: case 'm': { if (strlen(argv[i+1])>1 || !strchr("mscan",argv[i+1][0])) { printf ("Error reading command line argument m: should be 'm', 's', 'c', 'n' or 'a'\n"); return 0; } method = argv[i+1][0]; break; } This 'n' gets passed as 'method' to 'Hierarchical' in command.c and then to 'HierarchicalCluster' in data.c then to 'treecluster' in cluster.c II) in cluster.c Added case 'n' to switch function as follows switch(method) { case 's': pslcluster(nelements, distmatrix, result, linkdist); break; case 'm': pmlcluster(nelements, distmatrix, result, linkdist); break; case 'a': palcluster(nelements, distmatrix, result, linkdist); break; case 'c': pclcluster(nrows, ncolumns, data, mask, weight, distmatrix, dist, transpose, result, linkdist); break; case 'n': //added case to prevent clustering entirely if specified by cluster -m n break; } This prevents any clustering method from being called and results in nonsensical treefiles being produced (but reduces compute time) III) in cluster.c Added print statement to 'correlation' method to print tweight (number of overlapping datapoints in gene vectors being compared for Pearson calculation) printf("%f ",tweight); /*print tweight which should be the number of observations the pearson dist is based on, should print in front of distance (see below)*/ if (!tweight) return -2; /* usually due to empty clusters (see note below regarding -2)*/ IV) in cluster.c Added print statement to 'distancematrix' to print actual Pearson value. If distance was set to -2 (see above) print null value otherwise print Pearson: /* Calculate the distances and save them in the ragged array */ for (i = 0; i < n; i++){ for (j = 0; j < i; j++){ matrix[i][j]=metric(ndata,data,data,mask,mask,weights,i,j,transpose); if (matrix[i][j]==-2){printf("DISTANCE %d %d \n", i , j);} else{ printf("DISTANCE %d %d %f\n",i,j,1.0 - matrix[i][j]); } } } return matrix;