The AEnet package can be installed using the following command:
devtools::install_github(“liushang17/AEnet”)
This tutorial demonstrates the application of AEnet for single-cell
alternative splicing (AS) clustering and regulatory mechanism prediction
using full-length single-cell RNA-sequencing methods, such as
Smart-seq2. The dataset used to illustrate the utility of AEnet
comprises induced pluripotent stem cells (iPSCs), iPSC-derived motor
neurons (MNs), and neuronal progenitor cells (NPCs) (Song et al.,
2017).
This demonstration is based on the iPSC dataset, which can be downloaded from the following link: [https://drive.google.com/file/d/1Gk9FxlXIymV4HhpOnMWGlBXUhk4xs4q0/view?usp=drive_link]
library(Seurat)
library(AEN)
library(ggsci)
library(ggrepel)
library(ggpubr)
col <- c(pal_npg("nrc")(10)[1:6],"#FAFD7CFF","#FF6F00FF",pal_lancet("lanonc")(9)[c(1,3,7)],"#660099FF","#B5CF6BFF","#B24745FF","#CCFF00FF",
"#FFCD00FF","#800000FF","#20854EFF","#616530FF","#FF410DFF","#EE4C97FF","#FF1463FF","#00FF00FF","#990080FF","#00FFFFFF",
"#666666FF","#CC33FFFF","#00D68FFF","#4775FFFF","#C5B0D5FF","#FDAE6BFF","#79CC3DFF","#996600FF","#FFCCCCFF","#0000CCFF",
"#7A65A5FF","#1A5354FF","#24325FFF")
We will load the required input files for AEnet and describe the formatting requirements for each file.
This is a user-generated tab-delimited file in which each row corresponds to a cell ID, and each column represents associated cell metadata, such as cell type or patient identifier. The only required column is ‘cell’,‘Type’,and ‘Patient’; all other columns are optional.
indir <- "/Users/shangliu/01.terms/02.AEN/04.result/iPSC/input/"
outdir <- "/Users/shangliu/01.terms/02.AEN/04.result/iPSC/Result/"
ann <- read.table(paste0(indir,"/meta_data.xls"),sep = "\t") # The cell metadata
colnames(ann) <- c("cell","Type")
ann$Patient <- "Cell_line"
head(ann)
## cell Type Patient
## 1 SRR4047420 iPSC Cell_line
## 2 SRR4047449 iPSC Cell_line
## 3 SRR4047419 iPSC Cell_line
## 4 SRR4047448 iPSC Cell_line
## 5 SRR4047398 iPSC Cell_line
## 6 SRR4047446 iPSC Cell_line
The rows of this matrix represent splice junction coordinates, the columns correspond to cell IDs, and the values denote splice junction counts. This junction count matrix can be generated using the DESJ-detection (Liu et al., 2021) tool (https://github.com/liushang17/DESJ-detection), with the *SJ.out.tab files from STAR (v2.5.3, Dobin et al., 2013) as input.
mat <- readRDS(paste0(indir,"/ASP.rds")) # Cell-junction Count Matrix
mat[1:5,1:5]
## SRR4047245 SRR4047246 SRR4047247 SRR4047248
## chr1_100085435_100085535_2 0 0 0 0
## chr1_100085631_100088138_2 0 0 0 0
## chr1_100088237_100102954_2 0 0 0 0
## chr1_100103084_100105766_2 0 0 0 0
## chr1_100105904_100106911_2 0 0 0 0
## SRR4047249
## chr1_100085435_100085535_2 0
## chr1_100085631_100088138_2 0
## chr1_100088237_100102954_2 0
## chr1_100103084_100105766_2 0
## chr1_100105904_100106911_2 0
The rows of this metadata file represent splice junction coordinates, while the columns include cell IDs, gene IDs (second column), and gene symbols (third column). The metadata can be generated using the DESJ-detection tool (Liu et al., 2021) available at https://github.com/liushang17/DESJ-detection.
aspann <- read.table(paste0(indir,"/junction_anno.xls"),sep = "\t") # The junction
head(ann)
## cell Type Patient
## 1 SRR4047420 iPSC Cell_line
## 2 SRR4047449 iPSC Cell_line
## 3 SRR4047419 iPSC Cell_line
## 4 SRR4047448 iPSC Cell_line
## 5 SRR4047398 iPSC Cell_line
## 6 SRR4047446 iPSC Cell_line
The rows of this matrix represent gene symbol, the columns correspond to cell IDs, and the values indicate normalized gene expression levels.
exp <- readRDS(paste0(indir,"/exp.rds")) # Cell-Gene Expression Matrix
exp[1:5,1:5]
## SRR4047245 SRR4047246 SRR4047247 SRR4047248 SRR4047249
## TSPAN6 766.11400 39.467100 0.106833 342.142000 184.010000
## TNMD 0.00000 0.000000 0.000000 0.000000 0.000000
## DPM1 11.91651 0.000000 0.000000 5.037272 33.872500
## SCYL3 0.00000 0.000000 1.398650 0.000000 0.199726
## C1orf112 0.00000 0.081765 0.000000 0.000000 19.338940
A predefined list of splicing factors curated from Seiler et al. (2018).
sf <- read.table(paste0(indir,"/splicing.factor.v2.txt"),sep = "\t") # The SF list
head(sf)
## V1
## 1 ACIN1
## 2 AGGF1
## 3 ALYREF
## 4 AQR
## 5 ARGLU1
## 6 BAG2
Exon–exon junctions sharing the same start or end coordinates were
defined as alternative splicing patterns (ASPs), which can be identified
using AEnet. Existing tools such as BRIE, Outrigger, and MARVEL rely
heavily on transcriptome annotations, limiting their ability to detect
unannotated alternative splicing (AS) events. Moreover, these methods
exhibit limited sensitivity in identifying mutually exclusive exons
(MSE). In contrast, AEnet enables the detection of a wide spectrum of
splicing events—including exon skipping (ES), mutually exclusive exons
(MXE), alternative 5′ splice sites (A5SS), alternative 3′ splice sites
(A3SS), alternative last exons (ALE), alternative first exons (AFE),
MSE, and other unannotated AS events. The output consists of three
columns: the ASP event, the first junction of the ASP event, and the
corresponding alternative junction.
annj <- asp(mat, outdir,n_cores=2) ##### ASP detection
head(annj)
## all
## 1: chr1_100139712_100140181_1.chr1_100139712_100140407_1
## 2: chr1_100140252_100140407_1.chr1_100140252_100140449_1
## 3: chr1_100144144_100146526_1.chr1_100144144_100147893_1
## 4: chr1_100874431_100876032_2.chr1_100874431_100876793_2
## 5: chr1_100874431_100876032_2.chr1_100874431_100877475_2
## 6: chr1_100874431_100876793_2.chr1_100874431_100877475_2
## junction1 junction2
## 1: chr1_100139712_100140181_1 chr1_100139712_100140407_1
## 2: chr1_100140252_100140407_1 chr1_100140252_100140449_1
## 3: chr1_100144144_100146526_1 chr1_100144144_100147893_1
## 4: chr1_100874431_100876032_2 chr1_100874431_100876793_2
## 5: chr1_100874431_100876032_2 chr1_100874431_100877475_2
## 6: chr1_100874431_100876793_2 chr1_100874431_100877475_2
The percent spliced-in (PSI) for a specific AS event is defined as
the proportion of junction reads curated from all detected junctions
that span the same site. Notably, unlike expression values, which are
always nonnegative integers, PSI is assigned as NaN (resulting from
division by 0) when no junction reads are detected at a presupposed AS
site in a cell.
psi_matrix <- calculate_psi(mat,annj,outdir) ##### PSI calculation
psi_matrix[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## SRR4047245 SRR4047246
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1 NA NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1 NA NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2 NA NA
## chr1_100874431_100876032_2.chr1_100874431_100877475_2 NA NA
## SRR4047247 SRR4047248
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1 NA NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1 NA NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2 NA .
## chr1_100874431_100876032_2.chr1_100874431_100877475_2 NA NA
## SRR4047249
## chr1_100139712_100140181_1.chr1_100139712_100140407_1 NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1 NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1 NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2 NA
## chr1_100874431_100876032_2.chr1_100874431_100877475_2 NA
To mitigate this effect, for each ASP and each gene, we limit the
calculation of the correlation (Spearman by default) to cells with valid
PSI values and the gene’s expression level, retaining only those
ASP-expression correlation links (referred to as ASP-EXP links) with
significant P values. These statistically significant links indicate
potential relationships between gene expression dynamics and the usage
preference of specific ASPs across the cells. When multiple scRNA-seq
samples are available under any experimental conditions, we retain only
ASP-EXP links that share the same correlation trend and appear at a
moderately higher frequency (2 by default) to exclude batch
effect-induced artifacts.
multi(as.matrix(psi_matrix),as.matrix(exp),ann,outdir,parallel = T,inner_cores = 20)
corm <- process_cor_results(outdir) ####### final AEN network
head(corm)
## junction symbol correlation
## ALS2 chr1_100139712_100140181_1.chr1_100139712_100140407_1 ALS2 0.6470604
## ELOVL5 chr1_100139712_100140181_1.chr1_100139712_100140407_1 ELOVL5 0.7297233
## PSMC4 chr1_100139712_100140181_1.chr1_100139712_100140407_1 PSMC4 0.6967113
## NSMAF chr1_100139712_100140181_1.chr1_100139712_100140407_1 NSMAF 0.5986034
## WDR37 chr1_100139712_100140181_1.chr1_100139712_100140407_1 WDR37 -0.8470620
## CP chr1_100139712_100140181_1.chr1_100139712_100140407_1 CP -0.6853935
## type1 Freq
## ALS2 chr1_100139712_100140181_1.chr1_100139712_100140407_1:ALS2 1
## ELOVL5 chr1_100139712_100140181_1.chr1_100139712_100140407_1:ELOVL5 1
## PSMC4 chr1_100139712_100140181_1.chr1_100139712_100140407_1:PSMC4 1
## NSMAF chr1_100139712_100140181_1.chr1_100139712_100140407_1:NSMAF 1
## WDR37 chr1_100139712_100140181_1.chr1_100139712_100140407_1:WDR37 1
## CP chr1_100139712_100140181_1.chr1_100139712_100140407_1:CP 1
## p.value
## ALS2 4.111438e-03
## ELOVL5 9.001118e-04
## PSMC4 1.723646e-03
## NSMAF 8.680406e-03
## WDR37 4.328949e-05
## CP 2.123616e-03
pos <- which(corm$p.value < 0.00001) ####### The high quality AEN network
corm <- corm[pos,]
saveRDS(corm,file = paste0(outdir,"step3.network.rds"))
To prioritize splicing events with broad regulatory influence, AEnet
ranks ASPs or Genes based on the number of associated ASP–EXP links. The
top 1,500 ASPs and Genes (by default) are selected as anchor ASPs and
Genes. Using hierarchical clustering methods, we divided the ASPs into
ASP clusters according to their Jaccard similarity, ensuring that ASPs
in the same clusters were associated with similar gene expression sets.
corm2 <- asp_selection(corm,sample_cutoff = 0,link_cutoff = 15) ######### key ASP filter
junc_list <- junction_clustering(corm2) ####### ASP cluster
saveRDS(junc_list,file = paste0(outdir,"step4.asp.cluster.rds"))
junc_list <- readRDS(paste0(outdir,"step4.asp.cluster.rds"))
ann_junc <- junc_list$asp_clusters
head(ann_junc)
## cluster
## chr1_154170712_154171412_2.chr1_154170712_154172028_2 1
## chr10_73388415_73396043_2.chr10_73388415_73396518_2 1
## chr11_65344070_65344579_1.chr11_65344070_65345665_1 1
## chr17_40637573_40645575_2.chr17_40642560_40645575_2 1
## chr17_40637573_40645795_2.chr17_40645620_40645795_2 1
## chr1_219179247_219193081_1.chr1_219179247_219210531_1 2
## symbol
## chr1_154170712_154171412_2.chr1_154170712_154172028_2 chr1_154170712_154171412_2.chr1_154170712_154172028_2
## chr10_73388415_73396043_2.chr10_73388415_73396518_2 chr10_73388415_73396043_2.chr10_73388415_73396518_2
## chr11_65344070_65344579_1.chr11_65344070_65345665_1 chr11_65344070_65344579_1.chr11_65344070_65345665_1
## chr17_40637573_40645575_2.chr17_40642560_40645575_2 chr17_40637573_40645575_2.chr17_40642560_40645575_2
## chr17_40637573_40645795_2.chr17_40645620_40645795_2 chr17_40637573_40645795_2.chr17_40645620_40645795_2
## chr1_219179247_219193081_1.chr1_219179247_219210531_1 chr1_219179247_219193081_1.chr1_219179247_219210531_1
## final_cluster PC
## chr1_154170712_154171412_2.chr1_154170712_154172028_2 1 C_11
## chr10_73388415_73396043_2.chr10_73388415_73396518_2 1 C_11
## chr11_65344070_65344579_1.chr11_65344070_65345665_1 1 C_11
## chr17_40637573_40645575_2.chr17_40642560_40645575_2 1 C_11
## chr17_40637573_40645795_2.chr17_40645620_40645795_2 1 C_11
## chr1_219179247_219193081_1.chr1_219179247_219210531_1 2 C_8
corm2 <- gene_selection(corm,aspann,sample_cutoff = 0) ######### key Gene filter
exp_list <- gene_clustering(corm2) ############ Gene Cluster
saveRDS(exp_list,file = paste0(outdir,"step4.gene.cluster.rds"))
exp_list <- readRDS(paste0(outdir,"step4.gene.cluster.rds"))
exp_junc <- exp_list$asp_clusters
head(exp_junc)
## symbol final_cluster PC
## 1 NAT1 1 C_1
## 11 RP11-146I2.1 1 C_1
## 14 CAPN6 1 C_1
## 21 AC005606.15 1 C_1
## 226 EFEMP1 1 C_1
## 266 C11orf52 1 C_1
To further understand the cell heterogeneity at the level of
alternative splicing, we performed cell clustering analysis based on
alternative splicing and gene expression. Specifically, we first
calculated the enrichment scores of each ASP cluster and each gene
cluster for individual cells, resulting in a cell-by-cluster enrichment
score matrix. Next, the cell-ASP and Gene cluster enrichment score
matrix was normalized using the scale function. Finally, the normalized
matrix was used as input for the FindNeighbors and FindClusters
functions of Seurat to detect cell clusters. Notably, cell clustering
can be performed based on either cell–ASP cluster enrichment score
matrix or cell–gene cluster matrix alone. Finally, cell subpopulations
were delineated using a dual-modality approach that integrates
alternative splicing patterns (ASPs) with gene expression data.
asp_matrix <- asp_score(psi_matrix,ann_junc) ######## enrichment score for ASP
head(asp_matrix)
## Cell C_11 C_8 C_10 C_4 C_12 C_1
## 1 SRR4047245 0.08608491 0.9551321 0.01996087 0.8617287 0.7570227 0.9858797
## 2 SRR4047246 0.21062596 1.0000000 0.06044914 0.7429527 0.6612679 1.0000000
## 3 SRR4047247 0.35887045 0.8835382 0.02871816 0.6769231 0.9972768 0.9872307
## 4 SRR4047248 0.20596016 0.6817536 0.03753691 0.7910676 0.9636823 0.9754716
## 5 SRR4047249 0.16116175 0.9983936 0.06629867 0.7069808 0.8784297 0.9977568
## 6 SRR4047250 0.25733766 0.9017620 0.03174387 0.8026561 0.8841495 0.9956069
## C_2 C_3 C_5 C_6 C_7 C_9
## 1 0.9876543 0.0023546267 0.9624853 0.21873567 0.2330276 0.5337075
## 2 0.9577686 0.0074172316 0.9503834 0.09966499 0.1052621 0.4047099
## 3 0.9644688 0.0001893939 0.9621649 0.17425248 0.1161242 0.4231843
## 4 0.9480051 0.0001819505 0.9678930 0.15781513 0.1826361 0.4104118
## 5 0.9541186 0.0001016260 0.9481791 0.12650853 0.1419610 0.4712944
## 6 0.9805967 0.0029298093 0.9404070 0.08925608 0.2318708 0.3424178
exp_matrix <- gene_score(as.matrix(exp),exp_junc) ########### enrichment score for Gene
head(exp_matrix)
## Cell C_1 C_12 C_19 C_20 C_21 C_22 C_23
## 1 SRR4047245 1.8354465 4.936099 1.109181 3.910767 5.509210 2.8715241 8.413911
## 2 SRR4047246 2.3462994 4.601954 1.285510 2.329790 5.127256 3.7084925 8.631250
## 3 SRR4047247 2.6568690 4.402825 1.870440 2.489268 4.484451 1.5658638 8.445817
## 4 SRR4047248 1.8150681 4.574511 2.255408 3.126092 4.925889 1.0669898 8.706608
## 5 SRR4047249 0.8865686 4.998219 1.681492 2.823278 5.188593 3.1305415 8.017769
## 6 SRR4047250 1.1310404 4.135646 0.671704 1.886983 5.353034 0.2454213 8.165901
## C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
## 1 4.824444 9.196944 7.037113 8.323901 4.440586 8.890925 7.109179 7.951103
## 2 4.775597 9.021030 7.099886 8.540172 5.019486 9.093786 6.593973 8.166192
## 3 5.236150 9.190054 6.954381 8.136827 3.829693 8.555026 6.450728 7.949726
## 4 5.240681 8.704252 6.905120 8.124022 3.998401 8.674912 6.586134 8.135205
## 5 5.120432 8.784824 7.029907 8.300435 4.251527 8.865655 6.954750 8.095265
## 6 5.033070 8.783789 7.099179 8.434453 4.543069 9.061664 7.073977 8.162509
## C_10 C_11 C_13 C_14 C_15 C_16 C_17 C_18
## 1 3.383306 8.699141 6.832349 7.798832 6.967537 7.350886 6.692841 8.163602
## 2 4.991110 8.764397 6.736125 6.973203 6.821483 7.972849 6.597487 8.042281
## 3 4.078379 8.822916 5.980042 5.956789 6.579248 6.964547 6.060589 8.997987
## 4 3.762170 8.781351 6.528719 7.032745 6.519998 6.719146 6.031251 8.519565
## 5 4.160648 8.927161 6.807825 7.612346 6.842907 7.225829 6.870063 7.767436
## 6 4.992052 8.777048 7.021446 8.534031 7.004008 7.784936 6.731482 8.017737
res <- cell_clus(asp_matrix,exp_matrix,scale = FALSE) ########### Cell clustering
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8276
## Number of communities: 3
## Elapsed time: 0 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 11:43:08 UMAP embedding parameters a = 0.115 b = 1.929
## 11:43:08 Read 213 rows and found 35 numeric columns
## 11:43:08 Using Annoy for neighbor search, n_neighbors = 30
## 11:43:08 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:43:08 Writing NN index file to temp file /var/folders/qv/jdllzk4j5b172s6xdvsvnxlw0000gn/T//Rtmp9w4XgC/file15f73277c2955
## 11:43:08 Searching Annoy index using 1 thread, search_k = 3000
## 11:43:08 Annoy recall = 100%
## 11:43:09 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 11:43:10 Initializing from normalized Laplacian + noise (using RSpectra)
## 11:43:10 Commencing optimization for 500 epochs, with 8456 positive edges
## 11:43:11 Optimization finished
## Warning: No assay specified, setting assay as RNA by default.
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = Type))+geom_point()+
scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = cluster))+geom_point()+
scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()
res1 <- res[order(res$cluster),]
res2 <- data.frame(clus = res1$cluster)
rownames(res2) <- res1$Cell
rownames(asp_matrix) <- asp_matrix$Cell
asp_mat <- asp_matrix[, setdiff(colnames(asp_matrix), "Cell"), drop = FALSE]
asp_mat <- asp_mat[rownames(res2),]
pheatmap::pheatmap(asp_mat,cluster_rows = F,cluster_cols = F,annotation_row = res2,show_rownames = F,scale = "column")
The ARI and NMI values represent the concordance between the clusters identified by AEnet and the ground truth as reported in the original study.
siz <- NULL
pin <- c(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55)
for(i in pin){
res <- cell_clus(asp_matrix,exp_matrix,resolution = i,scale = FALSE)
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
if (require("mclust")) {
tmp <- adjustedRandIndex(ann1$Type,ann1$cluster)
}
siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9615
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9448
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9280
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9113
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8945
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8778
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8611
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8443
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8276
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 213
## Number of edges: 6979
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8108
## Number of communities: 3
## Elapsed time: 0 seconds
siz
## [1] 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717
## [8] 0.8240717 0.8240717 0.8240717
siz <- NULL
pin <- c(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55)
for(i in pin){
res <- cell_clus(NULL,exp_matrix,resolution = i,scale = FALSE)
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
if (require("aricode")) {
tmp <- NMI(ann1$Type,ann1$cluster)
}
siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9597
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9429
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9261
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9094
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8926
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8758
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8591
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8423
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8256
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 214
## Number of edges: 6720
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8088
## Number of communities: 3
## Elapsed time: 0 seconds
siz
## [1] 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428
## [8] 0.8082428 0.8082428 0.8082428
To identify key splicing factors, AEnet integrated the AEN network with a predefined list of splicing factors and a specific set of ASP events. This allowed us to construct a sub-network of ASP-splicing factors, facilitating the exploration of the regulatory relationships between alternative splicing events and splicing factors. The higher the degree strength of the splicing factor, the more it indicated a close relationship with the occurrence of these ASP events. The regulatory direction of splicing factors on the given set of ASPs was determined based on the proportion of positive or negative correlations. Specifically, if 75% or more of the splicing factor–ASP pairs were positive (or negative), the splicing factor was classified as positively (or negatively) regulating the ASP sets. AEnet also identified key pathways associated with different ASP patterns. For each ASP event, gene sets that were positively and negatively correlated with the event were identified based on the AEN network. Functional enrichment analysis was then performed on both the ASP’s gene and the positively (or negatively) correlated gene sets. The pathway to which the ASP gene belongs is considered the most likely pathway affected by the ASP change.
sfname <- sf$V1
sf <- key_sf(corm,sfname,ann_junc,cluster = "C_11")
head(sf)
## # A tibble: 6 × 4
## SF importance order direction
## <chr> <dbl> <int> <int>
## 1 ELAVL4 1 1 1
## 2 CELF4 0.88 2 1
## 3 CELF3 0.84 3 1
## 4 ELAVL2 0.84 4 1
## 5 ELAVL3 0.84 5 1
## 6 SNRPD2 0.76 6 -1
sf1 <- sf[1:10,]
sf1$SF <- factor(sf1$SF,levels = as.character(sf1$SF))
ggplot(sf1,aes(x=SF,y=importance))+geom_bar(stat = "identity")+theme_classic()