The AEnet package can be installed using the following command:
devtools::install_github(“liushang17/AEnet”)
This tutorial demonstrates the application of AEnet for single-cell
alternative splicing (AS) clustering and regulatory mechanism prediction
using full-length single-cell RNA-sequencing methods, such as
Smart-seq2. The dataset used to illustrate the utility of AEnet
comprises T cells with different status, such as naive, effector,
memory, exhaustion, and Treg (Zhang et al., 2018).
This demonstration is based on the T cells dataset, which can be downloaded from the following link: [https://drive.google.com/file/d/1OAPWcSb4CjIRewEAl88egrvmmKlT1vKj/view?usp=drive_link]
library(Seurat)
library(AEN)
library(ggsci)
library(ggrepel)
library(ggpubr)
col <- c(pal_npg("nrc")(10)[1:6],"#FAFD7CFF","#FF6F00FF",pal_lancet("lanonc")(9)[c(1,3,7)],"#660099FF","#B5CF6BFF","#B24745FF","#CCFF00FF",
"#FFCD00FF","#800000FF","#20854EFF","#616530FF","#FF410DFF","#EE4C97FF","#FF1463FF","#00FF00FF","#990080FF","#00FFFFFF",
"#666666FF","#CC33FFFF","#00D68FFF","#4775FFFF","#C5B0D5FF","#FDAE6BFF","#79CC3DFF","#996600FF","#FFCCCCFF","#0000CCFF",
"#7A65A5FF","#1A5354FF","#24325FFF")
We will load the required input files for AEnet and describe the formatting requirements for each file. ### Sample Metadata This is a user-generated tab-delimited file in which each row corresponds to a cell ID, and each column represents associated cell metadata, such as cell type or patient identifier. The only required column is ‘cell’,‘Type’,and ‘Patient’; all other columns are optional.
indir <- "/Users/shangliu/01.terms/02.AEN/04.result/CRC/input/"
outdir <- "/Users/shangliu/01.terms/02.AEN/04.result/CRC/result/"
ann <- read.table(paste0(indir,"/metadata.xls"),sep = "\t",header = T) # The cell metadata
ann <- ann[,c("allcell","cluster","Patient_ID")]
colnames(ann) <- c("cell","Type","Patient")
head(ann)
## cell Type Patient
## 1 PD10_ZZM_NP71.20180123 CD4_C05.CXCR6 P0123
## 2 PD10_ZZM_NP710.20180123 CD4_C11.IL10 P0123
## 3 PD10_ZZM_NP711.20180123 diverse.DN P0123
## 4 PD10_ZZM_NP712.20180123 diverse.other P0123
## 5 PD10_ZZM_NP713.20180123 diverse.DN P0123
## 6 PD10_ZZM_NP714.20180123 CD4_C05.CXCR6 P0123
The rows of this matrix represent splice junction coordinates, the columns correspond to cell IDs, and the values denote splice junction counts. This junction count matrix can be generated using the DESJ-detection (Liu et al., 2021) tool (https://github.com/liushang17/DESJ-detection), with the *SJ.out.tab files from STAR (v2.5.3, Dobin et al., 2013) as input. Due to the large size of the T cell dataset, we present only a subset of junctions from the Cell-Junction Count Matrix.
mat <- readRDS(paste0(indir,"/ASP.subset.rds")) # Cell-junction Count Matrix. The input should be ASP.rds
mat[1:5,1:5]
## PD10_ZZM_0701_PTR.1_36 PD10_ZZM_0701_PTR.1_37
## chr1_116760633_116764483_1 62.65773 90.20379
## chr1_116760633_116768463_1 0.00000 0.00000
## chr1_1203969_1204033_2 0.00000 0.00000
## chr1_1203969_1204054_2 0.00000 0.00000
## chr1_1212139_1212637_2 0.00000 19.18361
## PD10_ZZM_0701_PTR.1_38 PD10_ZZM_0701_PTR.1_42
## chr1_116760633_116764483_1 62.50081 75.11480
## chr1_116760633_116768463_1 0.00000 0.00000
## chr1_1203969_1204033_2 0.00000 0.00000
## chr1_1203969_1204054_2 0.00000 22.95174
## chr1_1212139_1212637_2 0.00000 0.00000
## PD10_ZZM_0701_PTR.1_45
## chr1_116760633_116764483_1 192.3182
## chr1_116760633_116768463_1 0.0000
## chr1_1203969_1204033_2 0.0000
## chr1_1203969_1204054_2 0.0000
## chr1_1212139_1212637_2 0.0000
The rows of this metadata file represent splice junction coordinates, while the columns include cell IDs, gene IDs (second column), and gene symbols (third column). The metadata can be generated using the DESJ-detection tool (Liu et al., 2021) available at https://github.com/liushang17/DESJ-detection.
aspann <- read.table(paste0(indir,"/junction_anno.xls"),sep = "\t") # The junction
head(ann)
## cell Type Patient
## 1 PD10_ZZM_NP71.20180123 CD4_C05.CXCR6 P0123
## 2 PD10_ZZM_NP710.20180123 CD4_C11.IL10 P0123
## 3 PD10_ZZM_NP711.20180123 diverse.DN P0123
## 4 PD10_ZZM_NP712.20180123 diverse.other P0123
## 5 PD10_ZZM_NP713.20180123 diverse.DN P0123
## 6 PD10_ZZM_NP714.20180123 CD4_C05.CXCR6 P0123
The rows of this matrix represent gene symbol, the columns correspond to cell IDs, and the values indicate normalized gene expression levels. Due to the large size of the T cell dataset, we present only a subset of genes from the Cell-gene Expression Matrix.
exp <- readRDS(paste0(indir,"/exp.subset.rds")) # Cell-Gene Expression Matrix. The input should be exp.rds
exp[1:5,1:5]
## PD10_ZZM_NP71.20180123 PD10_ZZM_NP710.20180123
## 10004 0.0000 0.0000
## 100048912 0.0000 0.0000
## 100049076 0.0000 0.0000
## 10010 184.6949 194.0816
## 100101467 0.0000 0.0000
## PD10_ZZM_NP714.20180123 PD10_ZZM_NP718.20180123
## 10004 0.0 4.181126
## 100048912 0.0 0.000000
## 100049076 0.0 0.000000
## 10010 2255.6 2.841733
## 100101467 0.0 0.000000
## PD10_ZZM_NP72.20180123
## 10004 0.000
## 100048912 0.000
## 100049076 0.000
## 10010 355.339
## 100101467 0.000
A predefined list of splicing factors curated from Seiler et al. (2018).
sf <- read.table(paste0(indir,"/splicing.factor.v2.txt"),sep = "\t") # The SF list
head(sf)
## V1
## 1 ACIN1
## 2 AGGF1
## 3 ALYREF
## 4 AQR
## 5 ARGLU1
## 6 BAG2
Exon–exon junctions sharing the same start or end coordinates were defined as alternative splicing patterns (ASPs), which can be identified using AEnet. Existing tools such as BRIE, Outrigger, and MARVEL rely heavily on transcriptome annotations, limiting their ability to detect unannotated alternative splicing (AS) events. Moreover, these methods exhibit limited sensitivity in identifying mutually exclusive exons (MSE). In contrast, AEnet enables the detection of a wide spectrum of splicing events—including exon skipping (ES), mutually exclusive exons (MXE), alternative 5′ splice sites (A5SS), alternative 3′ splice sites (A3SS), alternative last exons (ALE), alternative first exons (AFE), MSE, and other unannotated AS events. The output consists of three columns: the ASP event, the first junction of the ASP event, and the corresponding alternative junction.
annj <- asp(mat, outdir,n_cores=2) ##### ASP detection
head(annj)
## all
## 1: chr1_116760633_116764483_1.chr1_116760633_116768463_1
## 2: chr1_1203969_1204033_2.chr1_1203969_1204054_2
## 3: chr1_1212139_1212637_2.chr1_1212139_1212991_2
## 4: chr1_12202172_12202506_1.chr1_12202172_12206736_1
## 5: chr1_12202172_12202506_1.chr1_12202172_12206739_1
## 6: chr1_12202172_12206736_1.chr1_12202172_12206739_1
## junction1 junction2
## 1: chr1_116760633_116764483_1 chr1_116760633_116768463_1
## 2: chr1_1203969_1204033_2 chr1_1203969_1204054_2
## 3: chr1_1212139_1212637_2 chr1_1212139_1212991_2
## 4: chr1_12202172_12202506_1 chr1_12202172_12206736_1
## 5: chr1_12202172_12202506_1 chr1_12202172_12206739_1
## 6: chr1_12202172_12206736_1 chr1_12202172_12206739_1
The percent spliced-in (PSI) for a specific AS event is defined as
the proportion of junction reads curated from all detected junctions
that span the same site. Notably, unlike expression values, which are
always nonnegative integers, PSI is assigned as NaN (resulting from
division by 0) when no junction reads are detected at a presupposed AS
site in a cell.
psi_matrix <- calculate_psi(mat,annj,outdir) ##### PSI calculation
psi_matrix <- readRDS(paste0(outdir,"/step1.psi.rds"))
psi_matrix[1:5,1:5]
## PD10_ZZM_0701_PTR.1_36
## chr1_100121550_100122379_2.chr1_100121550_100123209_2 NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2 NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2 NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1 NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
## PD10_ZZM_0701_PTR.1_37
## chr1_100121550_100122379_2.chr1_100121550_100123209_2 NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2 NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2 NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1 NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
## PD10_ZZM_0701_PTR.1_38
## chr1_100121550_100122379_2.chr1_100121550_100123209_2 NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2 NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2 NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1 NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
## PD10_ZZM_0701_PTR.1_42
## chr1_100121550_100122379_2.chr1_100121550_100123209_2 NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2 NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2 NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1 NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
## PD10_ZZM_0701_PTR.1_45
## chr1_100121550_100122379_2.chr1_100121550_100123209_2 NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2 NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2 NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1 NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
To mitigate this effect, for each ASP and each gene, we limit the
calculation of the correlation (Spearman by default) to cells with valid
PSI values and the gene’s expression level, retaining only those
ASP-expression correlation links (referred to as ASP-EXP links) with
significant P values. These statistically significant links indicate
potential relationships between gene expression dynamics and the usage
preference of specific ASPs across the cells. When multiple scRNA-seq
samples are available under any experimental conditions, we retain only
ASP-EXP links that share the same correlation trend and appear at a
moderately higher frequency (2 by default) to exclude batch
effect-induced artifacts.
pos <- which(colnames(exp) %in% colnames(psi_matrix))
exp <- exp[,pos]
multi(as.matrix(psi_matrix),as.matrix(exp),ann,outdir,parallel = T,inner_cores = 20)
corm <- process_cor_results(outdir) ####### final AEN network
saveRDS(corm,file = paste0(outdir,"step3.network.rds"))
corm <- readRDS(paste0(outdir,"step3.network.rds"))
To prioritize splicing events with broad regulatory influence, AEnet
ranks ASPs or Genes based on the number of associated ASP–EXP links. The
top 1,500 ASPs and Genes (by default) are selected as anchor ASPs and
Genes. Using hierarchical clustering methods, we divided the ASPs into
ASP clusters according to their Jaccard similarity, ensuring that ASPs
in the same clusters were associated with similar gene expression sets.
corm2 <- asp_selection(corm,sample_cutoff = 2,link_cutoff = 15) ######### key ASP filter
junc_list <- junction_clustering(corm2) ####### ASP cluster
saveRDS(junc_list,file = paste0(outdir,"step4.asp.cluster.rds"))
junc_list <- readRDS(paste0(outdir,"step4.asp.cluster.rds"))
ann_junc <- junc_list$asp_clusters
head(ann_junc)
## cluster
## chr1_116760633_116764483_1.chr1_116760633_116768463_1 1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2 1
## chr1_1212139_1212637_2.chr1_1212139_1212991_2 1
## chr1_153543924_153544653_2.chr1_153543940_153544653_2 1
## chr10_62000322_62050887_1.chr10_62049484_62050887_1 1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2 1
## symbol
## chr1_116760633_116764483_1.chr1_116760633_116768463_1 chr1_116760633_116764483_1.chr1_116760633_116768463_1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2 chr1_1203969_1204033_2.chr1_1203969_1204054_2
## chr1_1212139_1212637_2.chr1_1212139_1212991_2 chr1_1212139_1212637_2.chr1_1212139_1212991_2
## chr1_153543924_153544653_2.chr1_153543940_153544653_2 chr1_153543924_153544653_2.chr1_153543940_153544653_2
## chr10_62000322_62050887_1.chr10_62049484_62050887_1 chr10_62000322_62050887_1.chr10_62049484_62050887_1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2 chr10_72072911_72088139_2.chr10_72086993_72088139_2
## final_cluster PC
## chr1_116760633_116764483_1.chr1_116760633_116768463_1 1 C_1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2 1 C_1
## chr1_1212139_1212637_2.chr1_1212139_1212991_2 1 C_1
## chr1_153543924_153544653_2.chr1_153543940_153544653_2 1 C_1
## chr10_62000322_62050887_1.chr10_62049484_62050887_1 1 C_1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2 1 C_1
corm2 <- gene_selection(corm,aspann,sample_cutoff = 2,link_cutoff = 30) ######### key Gene filter
exp_list <- gene_clustering(corm2) ############ Gene Cluster
saveRDS(exp_list,file = paste0(outdir,"step4.gene.cluster.rds"))
exp_list <- readRDS(paste0(outdir,"step4.gene.cluster.rds"))
exp_junc <- exp_list$asp_clusters
head(exp_junc)
## cluster symbol final_cluster PC
## 26789 1 26789 1 Other
## 112487 1 112487 1 Other
## 548593 1 548593 1 Other
## 64396 1 64396 1 Other
## 9304 1 9304 1 Other
## 339745 1 339745 1 Other
To further understand the cell heterogeneity at the level of
alternative splicing, we performed cell clustering analysis based on
alternative splicing and gene expression. Specifically, we first
calculated the enrichment scores of each ASP cluster and each gene
cluster for individual cells, resulting in a cell-by-cluster enrichment
score matrix. Next, the cell-ASP and Gene cluster enrichment score
matrix was normalized using the scale function. Finally, the normalized
matrix was used as input for the FindNeighbors and FindClusters
functions of Seurat to detect cell clusters. Notably, cell clustering
can be performed based on either cell–ASP cluster enrichment score
matrix or cell–gene cluster matrix alone. Finally, cell subpopulations
were delineated using a dual-modality approach that integrates
alternative splicing patterns (ASPs) with gene expression data.
clus <- sort(unique(as.character(ann$Type)))[1:19]
pos <- which(ann$Type %in% clus)
ann <- ann[pos,]
asp_matrix <- asp_score(psi_matrix,ann_junc) ######## enrichment score for ASP
head(asp_matrix)
## Cell C_1 C_4 C_6 C_7 C_8
## 1 PD10_ZZM_0701_PTR.1_36 0.7651617 0.019556025 0.3231888 0.4404289 0.6459422
## 2 PD10_ZZM_0701_PTR.1_37 0.9311400 0.007646356 0.2563217 0.4400661 0.6654201
## 3 PD10_ZZM_0701_PTR.1_38 0.6968085 0.146969060 0.1524836 0.5055555 0.6230515
## 4 PD10_ZZM_0701_PTR.1_42 0.8457279 0.130525362 0.4014901 0.5183716 0.7063305
## 5 PD10_ZZM_0701_PTR.1_45 0.8727653 0.090909091 0.3689828 0.4513161 0.6019509
## 6 PD10_ZZM_0701_PTR.1_46 0.8750000 0.012548015 0.3336098 0.4143742 0.5513801
## C_10 C_9 C_2 C_3 C_5
## 1 0.2631260 0.6976085 0.41160264 0.3614232 0.6534235
## 2 0.3126231 0.4800638 0.06976744 1.0000000 0.6107582
## 3 0.3874257 0.6445968 0.44499724 0.7413333 0.4285714
## 4 0.3533343 0.3631858 0.39480519 0.5000000 0.9840909
## 5 0.2433251 0.7338663 0.49479167 1.0000000 0.4297924
## 6 0.4403106 0.4973281 0.10736687 1.0000000 0.5560714
exp_matrix <- gene_score(as.matrix(exp),exp_junc) ########### enrichment score for Gene
head(exp_matrix)
## Cell C_10 C_14 C_15 C_16 C_17
## 1 PD10_ZZM_NP71.20180123 7.712374 6.12770137 7.754568 5.638499 11.899828
## 2 PD10_ZZM_NP710.20180123 6.079070 3.28384560 8.262383 6.649588 7.846693
## 3 PD10_ZZM_NP714.20180123 7.416429 2.25913840 7.709099 5.613175 10.713769
## 4 PD10_ZZM_NP718.20180123 7.810252 0.07578274 8.339096 6.589445 11.585649
## 5 PD10_ZZM_NP72.20180123 7.552827 0.58720696 7.661474 7.052814 10.327933
## 6 PD10_ZZM_NP720.20180123 6.254306 6.29985971 8.588824 5.070571 11.799051
## C_18 C_1 C_2 C_3 C_4 C_5 C_6 C_7
## 1 10.369670 11.008499 9.798153 0.1467817 5.808906 0.7292171 0.000000 6.997390
## 2 9.834946 10.004460 9.252286 3.9718132 4.200238 0.4970553 5.017091 7.769986
## 3 10.463140 10.541373 9.720342 0.0000000 5.517147 0.2047265 9.111754 7.143793
## 4 9.942566 11.092271 9.529693 0.0000000 1.893927 0.3014309 0.000000 7.689511
## 5 9.917747 9.794929 9.551042 0.1605858 5.618446 0.0000000 2.105425 7.574028
## 6 10.365822 10.552429 9.306200 0.1104655 4.849603 0.4742963 0.000000 7.193568
## C_8 C_9 C_11 C_12 C_13
## 1 7.809239 10.72156 7.262900 6.744839 9.028308
## 2 8.284007 10.65389 8.391313 8.922366 7.636574
## 3 8.395090 10.80107 8.310383 6.136417 10.415384
## 4 8.755691 10.43942 8.232970 7.419110 7.615616
## 5 7.352634 10.40811 8.851015 6.937945 7.019071
## 6 7.309897 10.90207 8.079903 5.409385 7.918600
siz <- NULL
for(i in 2:ncol(asp_matrix)){
tmp <- as.numeric(as.character(as.matrix(asp_matrix[,i])))
pos <- which(tmp %in% NaN)
siz <- c(siz,pos)
}
siz <- unique(siz)
asp_matrix <- asp_matrix[-siz,]
pos <- which(asp_matrix$Cell %in% ann$cell)
asp_matrix <- asp_matrix[pos,]
pos <- which(exp_matrix$Cell %in% ann$cell)
exp_matrix <- exp_matrix[pos,]
pos <- which(asp_matrix$Cell %in% exp_matrix$Cell)
asp_matrix <- asp_matrix[pos,]
pos <- which(exp_matrix$Cell %in% asp_matrix$Cell)
exp_matrix <- exp_matrix[pos,]
res <- cell_clus(asp_matrix,exp_matrix, resolution = 0.5,min.dist = 0.1) ########### Cell clustering
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 221701
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8560
## Number of communities: 6
## Elapsed time: 0 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 12:50:54 UMAP embedding parameters a = 1.577 b = 0.8951
## 12:50:54 Read 8216 rows and found 28 numeric columns
## 12:50:54 Using Annoy for neighbor search, n_neighbors = 30
## 12:50:54 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 12:50:55 Writing NN index file to temp file /var/folders/qv/jdllzk4j5b172s6xdvsvnxlw0000gn/T//RtmpI74v2e/file18051fdcf854
## 12:50:55 Searching Annoy index using 1 thread, search_k = 3000
## 12:50:57 Annoy recall = 100%
## 12:50:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 12:50:59 Initializing from normalized Laplacian + noise (using RSpectra)
## 12:50:59 Commencing optimization for 500 epochs, with 352608 positive edges
## 12:51:14 Optimization finished
## Warning: No assay specified, setting assay as RNA by default.
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = Type))+geom_point()+
scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = cluster))+geom_point()+
scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()
res1 <- res[order(res$cluster),]
res2 <- data.frame(clus = res1$cluster)
rownames(res2) <- res1$Cell
rownames(asp_matrix) <- asp_matrix$Cell
asp_mat <- asp_matrix[, setdiff(colnames(asp_matrix), "Cell"), drop = FALSE]
asp_mat <- asp_mat[rownames(res2),]
pheatmap::pheatmap(asp_mat,cluster_rows = F,cluster_cols = F,annotation_row = res2,show_rownames = F,scale = "column")
The ARI and NMI values represent the concordance between the clusters identified by AEnet and the ground truth as reported in the original study.
siz <- NULL
pin <- c(0.9,0.95,1,1.05,1.1,1.15,1.2,1.25,1.3,1.35)
for(i in pin){
res <- cell_clus(asp_matrix,exp_matrix,resolution = i,scale = FALSE)
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
if (require("mclust")) {
tmp <- adjustedRandIndex(ann1$Type,ann1$cluster)
}
siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7702
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7628
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7554
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7480
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7415
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7349
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7282
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7216
## Number of communities: 8
## Elapsed time: 1 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7148
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 279224
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7081
## Number of communities: 8
## Elapsed time: 0 seconds
siz
## [1] 0.3941760 0.3943857 0.4005765 0.4164451 0.4178772 0.4194442 0.4215663
## [8] 0.4196046 0.4203675 0.4216739
siz <- NULL
pin <- c(0.9,0.95,1,1.05,1.1,1.15,1.2,1.25,1.3,1.35)
for(i in pin){
res <- cell_clus(NULL,exp_matrix,resolution = i,scale = FALSE)
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
if (require("aricode")) {
tmp <- NMI(ann1$Type,ann1$cluster)
}
siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7876
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7811
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7754
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7697
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7640
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7581
## Number of communities: 11
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7524
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7472
## Number of communities: 13
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7415
## Number of communities: 14
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8216
## Number of edges: 289573
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7371
## Number of communities: 14
## Elapsed time: 0 seconds
siz
## [1] 0.4739873 0.4690200 0.4701775 0.4700163 0.4684516 0.4711230 0.4729448
## [8] 0.4752702 0.4799668 0.4787811
To identify key splicing factors, AEnet integrated the AEN network with a predefined list of splicing factors and a specific set of ASP events. This allowed us to construct a sub-network of ASP-splicing factors, facilitating the exploration of the regulatory relationships between alternative splicing events and splicing factors. The higher the degree strength of the splicing factor, the more it indicated a close relationship with the occurrence of these ASP events. The regulatory direction of splicing factors on the given set of ASPs was determined based on the proportion of positive or negative correlations. Specifically, if 75% or more of the splicing factor–ASP pairs were positive (or negative), the splicing factor was classified as positively (or negatively) regulating the ASP sets. AEnet also identified key pathways associated with different ASP patterns. For each ASP event, gene sets that were positively and negatively correlated with the event were identified based on the AEN network. Functional enrichment analysis was then performed on both the ASP’s gene and the positively (or negatively) correlated gene sets. The pathway to which the ASP gene belongs is considered the most likely pathway affected by the ASP change.
sfname <- sf$V1
geneinfo <- read.table(paste0(indir,"/Gene.info"),sep = "\t",header = T)
pos <- which(geneinfo$symbol %in% sfname)
sfid <- geneinfo$geneID[pos]
sf <- key_sf(corm,as.character(sfid),ann_junc,cluster = "C_3")
sf1 <- sf[1:10,]
colnames(sf1)[1] <- c("geneID")
sf2 <- merge(sf1,geneinfo,by = "geneID")
sf2 <- sf2[order(-sf2$importance),]
sf2$symbol <- factor(sf2$symbol,levels = as.character(sf2$symbol))
ggplot(sf2,aes(x=symbol,y=importance))+geom_bar(stat = "identity")+theme_classic()