Case Studies and Tutorials

Download 10x Genomics data files from GEO

From GEO database, we obtain the FTP links to the data files we need. Here we use a data set from sample GSM3535276 as an example ( https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3535276). The sample is human AXLN1 lymphatic endothelial cells.

Supplementary file Size Download File type/resource
GSM3535276_AXLN1_barcodes.tsv.gz 33.6 Kb (ftp)(http) TSV
GSM3535276_AXLN1_genes.tsv.gz 251.2 Kb (ftp)(http) TSV
GSM3535276_AXLN1_matrix.mtx.gz 45.8 Mb (ftp)(http) MTX

We can use gunzip function directly download and unzip the files.

gunzip('https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3535nnn/GSM3535276/suppl/GSM3535276_AXLN1_matrix.mtx.gz');
gunzip('https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3535nnn/GSM3535276/suppl/GSM3535276_AXLN1_genes.tsv.gz');

We can then use the code below to import data into MATLAB.

[X,g]=sc_readmtxfile('GSM3535276_AXLN1_matrix.mtx','GSM3535276_AXLN1_genes.tsv');
scgeatool(X,g)

Process downloaded 10x Genomics data files

In a 10x Genomics data folder, there should be matrix.mtx and genes.tsv. Here is the commandline code for raw data processing.

[X,g]=sc_readmtxfile('matrix.mtx','genes.tsv');
[X,g]=sc_qcfilter(X,g);
[X,g]=sc_selectg(X,g,1,0.05);
[s]=sc_tsne(X);
scgeatool(X,g,s)

Download Drop-seq data files from GEO

From GEO database, we obtain the FTP links to the data files we need. Here we use a data set from sample GSM3036814 as an example (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3036814). The sample is mouse lung cells.

Supplementary file Size Download File type/resource
GSM3036814_Control_6_Mouse_lung_digital_gene_expression_6000.dge.txt.gz 1.7 Mb (ftp)(http) TXT

We can use gunzip function directly download and unzip the files.

gunzip('https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3036nnn/GSM3036814/suppl/GSM3036814_Control_6_Mouse_lung_digital_gene_expression_6000.dge.txt.gz')

We can then use the code below to import data into MATLAB.

[X,g]=sc_readtsvfile('GSM3036814_Control_6_Mouse_lung_digital_gene_expression_6000.dge.txt');
[X,g]=sc_qcfilter(X,g);
[X,g]=sc_selectg(X,g,1,0.05);
[s]=sc_tsne(X);
scgeatool(X,g,s)

Import Seurat RData

For example, we are trying to read files from https://www.synapse.org/#!Synapse:syn22855256. They are described as pbmc_discovery_v1.RData and pbmc_replication_v1.RData are Seurat objects containing the gene expression raw counts and log normalized data, the phenotype Label (“CI” for MCI, “C” for control) and the inferred cell identity of the discovery and replication cohort, respectively.

library(Seurat)
library(Matrix)
load('pbmc_discovery_v1.RData')
countMatrix <- pbmc_discovery@assays$RNA@counts
writeMM(obj = countMatrix, file = 'matrix.mtx')
writeLines(text = rownames(countMatrix), con = 'features.tsv')
writeLines(text = colnames(countMatrix), con = 'barcodes.tsv')
metadata <- pbmc_discovery@meta.data
write.csv(x = metadata, file = 'metadata.csv', quote = FALSE)

After exporting Seurate object data into the three files, you can then use MATLAB to read the files:

[X,genelist,barcodelist]=sc_readmtxfile('matrix.mtx','features.tsv','barcodes.tsv',1);
sce=SingleCellExperiment(X,genelist);
T=readtable('metadata.csv')
c=string(T.Label);
sce.c_batch_id=c;
scgeatool(sce)

Import data from a TSV/Excel file

If your scRNA-seq data is in Excel file, save it as TSV or CSV a file with the format like this:

genes X1      X2      X3      X4      X5      X6      X7      X8      X9
NOC2L 1       1       2       3       3       2       0       1       3
HES4  50      15      19      50      8       87      23      25      29
ISG15 279     312     425     180     406     408     335     403     398
AGRN  3       4       9       5       2       3       8       8       9
SDF4  2       2       4       0       5       0       4       2       5
B3GALT6       2       1       0       0       1       0       1       1       0
UBE2J2        1       2       3       1       1       1       6       3       4
SCNN1D        0       1       0       0       0       0       0       0       0
ACAP3 1       3       1       0       1       0       0       1       0

Then you can use function sc_readtsvfile to import the data. Here is an example:

cdgea;
[X,g]=sc_readtsvfile('example_data\GSM3204304_P_P_Expr.csv');

Visualize data in 6D

cdgea;
load example_data\example10xdata.mat
% s=sc_tsne(X,6,false,true);
s=s_tsne6;    % using pre-computed 6-d embedding S_TSNE6
gui.sc_multiembeddings(s(:,1:3),s(:,4:6));

Here is what you should get:

sixdview