Preparation

The AEnet package can be installed using the following command: devtools::install_github(“liushang17/AEnet”)
This tutorial demonstrates the application of AEnet for single-cell alternative splicing (AS) clustering and regulatory mechanism prediction using full-length single-cell RNA-sequencing methods, such as Smart-seq2. The dataset used to illustrate the utility of AEnet comprises induced pluripotent stem cells (iPSCs), iPSC-derived motor neurons (MNs), and neuronal progenitor cells (NPCs) (Song et al., 2017).

This demonstration is based on the iPSC dataset, which can be downloaded from the following link: [https://drive.google.com/file/d/1Gk9FxlXIymV4HhpOnMWGlBXUhk4xs4q0/view?usp=drive_link]

Load Package

library(Seurat)
library(AEN)
library(ggsci)
library(ggrepel)
library(ggpubr)
col <- c(pal_npg("nrc")(10)[1:6],"#FAFD7CFF","#FF6F00FF",pal_lancet("lanonc")(9)[c(1,3,7)],"#660099FF","#B5CF6BFF","#B24745FF","#CCFF00FF",
         "#FFCD00FF","#800000FF","#20854EFF","#616530FF","#FF410DFF","#EE4C97FF","#FF1463FF","#00FF00FF","#990080FF","#00FFFFFF",
         "#666666FF","#CC33FFFF","#00D68FFF","#4775FFFF","#C5B0D5FF","#FDAE6BFF","#79CC3DFF","#996600FF","#FFCCCCFF","#0000CCFF",
         "#7A65A5FF","#1A5354FF","#24325FFF")

Input

We will load the required input files for AEnet and describe the formatting requirements for each file.

Sample Metadata

This is a user-generated tab-delimited file in which each row corresponds to a cell ID, and each column represents associated cell metadata, such as cell type or patient identifier. The only required column is ‘cell’,‘Type’,and ‘Patient’; all other columns are optional.

indir <- "/Users/shangliu/01.terms/02.AEN/04.result/iPSC/input/"
outdir <- "/Users/shangliu/01.terms/02.AEN/04.result/iPSC/Result/"
ann <- read.table(paste0(indir,"/meta_data.xls"),sep = "\t") # The cell metadata

colnames(ann) <- c("cell","Type")
ann$Patient <- "Cell_line"
head(ann)
##         cell Type   Patient
## 1 SRR4047420 iPSC Cell_line
## 2 SRR4047449 iPSC Cell_line
## 3 SRR4047419 iPSC Cell_line
## 4 SRR4047448 iPSC Cell_line
## 5 SRR4047398 iPSC Cell_line
## 6 SRR4047446 iPSC Cell_line

Splice junction counts matrix

The rows of this matrix represent splice junction coordinates, the columns correspond to cell IDs, and the values denote splice junction counts. This junction count matrix can be generated using the DESJ-detection (Liu et al., 2021) tool (https://github.com/liushang17/DESJ-detection), with the *SJ.out.tab files from STAR (v2.5.3, Dobin et al., 2013) as input.

mat <- readRDS(paste0(indir,"/ASP.rds")) # Cell-junction Count Matrix
mat[1:5,1:5]
##                            SRR4047245 SRR4047246 SRR4047247 SRR4047248
## chr1_100085435_100085535_2          0          0          0          0
## chr1_100085631_100088138_2          0          0          0          0
## chr1_100088237_100102954_2          0          0          0          0
## chr1_100103084_100105766_2          0          0          0          0
## chr1_100105904_100106911_2          0          0          0          0
##                            SRR4047249
## chr1_100085435_100085535_2          0
## chr1_100085631_100088138_2          0
## chr1_100088237_100102954_2          0
## chr1_100103084_100105766_2          0
## chr1_100105904_100106911_2          0

Splicing event metadata

  The rows of this metadata file represent splice junction coordinates, while the columns include cell IDs, gene IDs (second column), and gene symbols (third column). The metadata can be generated using the DESJ-detection tool (Liu et al., 2021) available at https://github.com/liushang17/DESJ-detection.
aspann <- read.table(paste0(indir,"/junction_anno.xls"),sep = "\t") # The junction 
head(ann)
##         cell Type   Patient
## 1 SRR4047420 iPSC Cell_line
## 2 SRR4047449 iPSC Cell_line
## 3 SRR4047419 iPSC Cell_line
## 4 SRR4047448 iPSC Cell_line
## 5 SRR4047398 iPSC Cell_line
## 6 SRR4047446 iPSC Cell_line

Gene expression matrix

The rows of this matrix represent gene symbol, the columns correspond to cell IDs, and the values indicate normalized gene expression levels.

exp <- readRDS(paste0(indir,"/exp.rds")) # Cell-Gene Expression Matrix
exp[1:5,1:5]
##          SRR4047245 SRR4047246 SRR4047247 SRR4047248 SRR4047249
## TSPAN6    766.11400  39.467100   0.106833 342.142000 184.010000
## TNMD        0.00000   0.000000   0.000000   0.000000   0.000000
## DPM1       11.91651   0.000000   0.000000   5.037272  33.872500
## SCYL3       0.00000   0.000000   1.398650   0.000000   0.199726
## C1orf112    0.00000   0.081765   0.000000   0.000000  19.338940

Splicing Factor list

A predefined list of splicing factors curated from Seiler et al. (2018).

sf <- read.table(paste0(indir,"/splicing.factor.v2.txt"),sep = "\t") # The SF list
head(sf)
##       V1
## 1  ACIN1
## 2  AGGF1
## 3 ALYREF
## 4    AQR
## 5 ARGLU1
## 6   BAG2

ASP detection

Exon–exon junctions sharing the same start or end coordinates were defined as alternative splicing patterns (ASPs), which can be identified using AEnet. Existing tools such as BRIE, Outrigger, and MARVEL rely heavily on transcriptome annotations, limiting their ability to detect unannotated alternative splicing (AS) events. Moreover, these methods exhibit limited sensitivity in identifying mutually exclusive exons (MSE). In contrast, AEnet enables the detection of a wide spectrum of splicing events—including exon skipping (ES), mutually exclusive exons (MXE), alternative 5′ splice sites (A5SS), alternative 3′ splice sites (A3SS), alternative last exons (ALE), alternative first exons (AFE), MSE, and other unannotated AS events. The output consists of three columns: the ASP event, the first junction of the ASP event, and the corresponding alternative junction.

annj <- asp(mat, outdir,n_cores=2) ##### ASP detection
head(annj)
##                                                      all
## 1: chr1_100139712_100140181_1.chr1_100139712_100140407_1
## 2: chr1_100140252_100140407_1.chr1_100140252_100140449_1
## 3: chr1_100144144_100146526_1.chr1_100144144_100147893_1
## 4: chr1_100874431_100876032_2.chr1_100874431_100876793_2
## 5: chr1_100874431_100876032_2.chr1_100874431_100877475_2
## 6: chr1_100874431_100876793_2.chr1_100874431_100877475_2
##                     junction1                  junction2
## 1: chr1_100139712_100140181_1 chr1_100139712_100140407_1
## 2: chr1_100140252_100140407_1 chr1_100140252_100140449_1
## 3: chr1_100144144_100146526_1 chr1_100144144_100147893_1
## 4: chr1_100874431_100876032_2 chr1_100874431_100876793_2
## 5: chr1_100874431_100876032_2 chr1_100874431_100877475_2
## 6: chr1_100874431_100876793_2 chr1_100874431_100877475_2

PSI calculation

The percent spliced-in (PSI) for a specific AS event is defined as the proportion of junction reads curated from all detected junctions that span the same site. Notably, unlike expression values, which are always nonnegative integers, PSI is assigned as NaN (resulting from division by 0) when no junction reads are detected at a presupposed AS site in a cell.

psi_matrix <- calculate_psi(mat,annj,outdir) ##### PSI calculation
psi_matrix[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
##                                                       SRR4047245 SRR4047246
## chr1_100139712_100140181_1.chr1_100139712_100140407_1         NA         NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1         NA         NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1         NA         NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2         NA         NA
## chr1_100874431_100876032_2.chr1_100874431_100877475_2         NA         NA
##                                                       SRR4047247 SRR4047248
## chr1_100139712_100140181_1.chr1_100139712_100140407_1         NA         NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1         NA         NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1         NA         NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2         NA          .
## chr1_100874431_100876032_2.chr1_100874431_100877475_2         NA         NA
##                                                       SRR4047249
## chr1_100139712_100140181_1.chr1_100139712_100140407_1         NA
## chr1_100140252_100140407_1.chr1_100140252_100140449_1         NA
## chr1_100144144_100146526_1.chr1_100144144_100147893_1         NA
## chr1_100874431_100876032_2.chr1_100874431_100876793_2         NA
## chr1_100874431_100876032_2.chr1_100874431_100877475_2         NA

ASP-EXP Network

To mitigate this effect, for each ASP and each gene, we limit the calculation of the correlation (Spearman by default) to cells with valid PSI values and the gene’s expression level, retaining only those ASP-expression correlation links (referred to as ASP-EXP links) with significant P values. These statistically significant links indicate potential relationships between gene expression dynamics and the usage preference of specific ASPs across the cells. When multiple scRNA-seq samples are available under any experimental conditions, we retain only ASP-EXP links that share the same correlation trend and appear at a moderately higher frequency (2 by default) to exclude batch effect-induced artifacts.

multi(as.matrix(psi_matrix),as.matrix(exp),ann,outdir,parallel = T,inner_cores = 20)
corm <- process_cor_results(outdir) ####### final AEN network
head(corm)
##                                                     junction symbol correlation
## ALS2   chr1_100139712_100140181_1.chr1_100139712_100140407_1   ALS2   0.6470604
## ELOVL5 chr1_100139712_100140181_1.chr1_100139712_100140407_1 ELOVL5   0.7297233
## PSMC4  chr1_100139712_100140181_1.chr1_100139712_100140407_1  PSMC4   0.6967113
## NSMAF  chr1_100139712_100140181_1.chr1_100139712_100140407_1  NSMAF   0.5986034
## WDR37  chr1_100139712_100140181_1.chr1_100139712_100140407_1  WDR37  -0.8470620
## CP     chr1_100139712_100140181_1.chr1_100139712_100140407_1     CP  -0.6853935
##                                                               type1 Freq
## ALS2     chr1_100139712_100140181_1.chr1_100139712_100140407_1:ALS2    1
## ELOVL5 chr1_100139712_100140181_1.chr1_100139712_100140407_1:ELOVL5    1
## PSMC4   chr1_100139712_100140181_1.chr1_100139712_100140407_1:PSMC4    1
## NSMAF   chr1_100139712_100140181_1.chr1_100139712_100140407_1:NSMAF    1
## WDR37   chr1_100139712_100140181_1.chr1_100139712_100140407_1:WDR37    1
## CP         chr1_100139712_100140181_1.chr1_100139712_100140407_1:CP    1
##             p.value
## ALS2   4.111438e-03
## ELOVL5 9.001118e-04
## PSMC4  1.723646e-03
## NSMAF  8.680406e-03
## WDR37  4.328949e-05
## CP     2.123616e-03
pos <- which(corm$p.value < 0.00001) ####### The high quality AEN network
corm <- corm[pos,]
saveRDS(corm,file = paste0(outdir,"step3.network.rds"))

ASP & Gene Clusters

To prioritize splicing events with broad regulatory influence, AEnet ranks ASPs or Genes based on the number of associated ASP–EXP links. The top 1,500 ASPs and Genes (by default) are selected as anchor ASPs and Genes. Using hierarchical clustering methods, we divided the ASPs into ASP clusters according to their Jaccard similarity, ensuring that ASPs in the same clusters were associated with similar gene expression sets.

corm2 <- asp_selection(corm,sample_cutoff = 0,link_cutoff = 15) ######### key ASP filter
junc_list <- junction_clustering(corm2) ####### ASP cluster
saveRDS(junc_list,file = paste0(outdir,"step4.asp.cluster.rds"))
junc_list <- readRDS(paste0(outdir,"step4.asp.cluster.rds"))
ann_junc <- junc_list$asp_clusters
head(ann_junc)
##                                                       cluster
## chr1_154170712_154171412_2.chr1_154170712_154172028_2       1
## chr10_73388415_73396043_2.chr10_73388415_73396518_2         1
## chr11_65344070_65344579_1.chr11_65344070_65345665_1         1
## chr17_40637573_40645575_2.chr17_40642560_40645575_2         1
## chr17_40637573_40645795_2.chr17_40645620_40645795_2         1
## chr1_219179247_219193081_1.chr1_219179247_219210531_1       2
##                                                                                                      symbol
## chr1_154170712_154171412_2.chr1_154170712_154172028_2 chr1_154170712_154171412_2.chr1_154170712_154172028_2
## chr10_73388415_73396043_2.chr10_73388415_73396518_2     chr10_73388415_73396043_2.chr10_73388415_73396518_2
## chr11_65344070_65344579_1.chr11_65344070_65345665_1     chr11_65344070_65344579_1.chr11_65344070_65345665_1
## chr17_40637573_40645575_2.chr17_40642560_40645575_2     chr17_40637573_40645575_2.chr17_40642560_40645575_2
## chr17_40637573_40645795_2.chr17_40645620_40645795_2     chr17_40637573_40645795_2.chr17_40645620_40645795_2
## chr1_219179247_219193081_1.chr1_219179247_219210531_1 chr1_219179247_219193081_1.chr1_219179247_219210531_1
##                                                       final_cluster   PC
## chr1_154170712_154171412_2.chr1_154170712_154172028_2             1 C_11
## chr10_73388415_73396043_2.chr10_73388415_73396518_2               1 C_11
## chr11_65344070_65344579_1.chr11_65344070_65345665_1               1 C_11
## chr17_40637573_40645575_2.chr17_40642560_40645575_2               1 C_11
## chr17_40637573_40645795_2.chr17_40645620_40645795_2               1 C_11
## chr1_219179247_219193081_1.chr1_219179247_219210531_1             2  C_8
corm2 <- gene_selection(corm,aspann,sample_cutoff = 0) ######### key Gene filter
exp_list <- gene_clustering(corm2) ############ Gene Cluster
saveRDS(exp_list,file = paste0(outdir,"step4.gene.cluster.rds"))
exp_list <- readRDS(paste0(outdir,"step4.gene.cluster.rds"))
exp_junc <- exp_list$asp_clusters
head(exp_junc)
##           symbol final_cluster  PC
## 1           NAT1             1 C_1
## 11  RP11-146I2.1             1 C_1
## 14         CAPN6             1 C_1
## 21   AC005606.15             1 C_1
## 226       EFEMP1             1 C_1
## 266     C11orf52             1 C_1

Cell Clustering

To further understand the cell heterogeneity at the level of alternative splicing, we performed cell clustering analysis based on alternative splicing and gene expression. Specifically, we first calculated the enrichment scores of each ASP cluster and each gene cluster for individual cells, resulting in a cell-by-cluster enrichment score matrix. Next, the cell-ASP and Gene cluster enrichment score matrix was normalized using the scale function. Finally, the normalized matrix was used as input for the FindNeighbors and FindClusters functions of Seurat to detect cell clusters. Notably, cell clustering can be performed based on either cell–ASP cluster enrichment score matrix or cell–gene cluster matrix alone. Finally, cell subpopulations were delineated using a dual-modality approach that integrates alternative splicing patterns (ASPs) with gene expression data.

asp_matrix <- asp_score(psi_matrix,ann_junc) ######## enrichment score for ASP
head(asp_matrix)
##         Cell       C_11       C_8       C_10       C_4      C_12       C_1
## 1 SRR4047245 0.08608491 0.9551321 0.01996087 0.8617287 0.7570227 0.9858797
## 2 SRR4047246 0.21062596 1.0000000 0.06044914 0.7429527 0.6612679 1.0000000
## 3 SRR4047247 0.35887045 0.8835382 0.02871816 0.6769231 0.9972768 0.9872307
## 4 SRR4047248 0.20596016 0.6817536 0.03753691 0.7910676 0.9636823 0.9754716
## 5 SRR4047249 0.16116175 0.9983936 0.06629867 0.7069808 0.8784297 0.9977568
## 6 SRR4047250 0.25733766 0.9017620 0.03174387 0.8026561 0.8841495 0.9956069
##         C_2          C_3       C_5        C_6       C_7       C_9
## 1 0.9876543 0.0023546267 0.9624853 0.21873567 0.2330276 0.5337075
## 2 0.9577686 0.0074172316 0.9503834 0.09966499 0.1052621 0.4047099
## 3 0.9644688 0.0001893939 0.9621649 0.17425248 0.1161242 0.4231843
## 4 0.9480051 0.0001819505 0.9678930 0.15781513 0.1826361 0.4104118
## 5 0.9541186 0.0001016260 0.9481791 0.12650853 0.1419610 0.4712944
## 6 0.9805967 0.0029298093 0.9404070 0.08925608 0.2318708 0.3424178
exp_matrix <- gene_score(as.matrix(exp),exp_junc) ########### enrichment score for Gene
head(exp_matrix)
##         Cell       C_1     C_12     C_19     C_20     C_21      C_22     C_23
## 1 SRR4047245 1.8354465 4.936099 1.109181 3.910767 5.509210 2.8715241 8.413911
## 2 SRR4047246 2.3462994 4.601954 1.285510 2.329790 5.127256 3.7084925 8.631250
## 3 SRR4047247 2.6568690 4.402825 1.870440 2.489268 4.484451 1.5658638 8.445817
## 4 SRR4047248 1.8150681 4.574511 2.255408 3.126092 4.925889 1.0669898 8.706608
## 5 SRR4047249 0.8865686 4.998219 1.681492 2.823278 5.188593 3.1305415 8.017769
## 6 SRR4047250 1.1310404 4.135646 0.671704 1.886983 5.353034 0.2454213 8.165901
##        C_2      C_3      C_4      C_5      C_6      C_7      C_8      C_9
## 1 4.824444 9.196944 7.037113 8.323901 4.440586 8.890925 7.109179 7.951103
## 2 4.775597 9.021030 7.099886 8.540172 5.019486 9.093786 6.593973 8.166192
## 3 5.236150 9.190054 6.954381 8.136827 3.829693 8.555026 6.450728 7.949726
## 4 5.240681 8.704252 6.905120 8.124022 3.998401 8.674912 6.586134 8.135205
## 5 5.120432 8.784824 7.029907 8.300435 4.251527 8.865655 6.954750 8.095265
## 6 5.033070 8.783789 7.099179 8.434453 4.543069 9.061664 7.073977 8.162509
##       C_10     C_11     C_13     C_14     C_15     C_16     C_17     C_18
## 1 3.383306 8.699141 6.832349 7.798832 6.967537 7.350886 6.692841 8.163602
## 2 4.991110 8.764397 6.736125 6.973203 6.821483 7.972849 6.597487 8.042281
## 3 4.078379 8.822916 5.980042 5.956789 6.579248 6.964547 6.060589 8.997987
## 4 3.762170 8.781351 6.528719 7.032745 6.519998 6.719146 6.031251 8.519565
## 5 4.160648 8.927161 6.807825 7.612346 6.842907 7.225829 6.870063 7.767436
## 6 4.992052 8.777048 7.021446 8.534031 7.004008 7.784936 6.731482 8.017737
res <- cell_clus(asp_matrix,exp_matrix,scale = FALSE) ########### Cell clustering
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8276
## Number of communities: 3
## Elapsed time: 0 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 11:43:08 UMAP embedding parameters a = 0.115 b = 1.929
## 11:43:08 Read 213 rows and found 35 numeric columns
## 11:43:08 Using Annoy for neighbor search, n_neighbors = 30
## 11:43:08 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:43:08 Writing NN index file to temp file /var/folders/qv/jdllzk4j5b172s6xdvsvnxlw0000gn/T//Rtmp9w4XgC/file15f73277c2955
## 11:43:08 Searching Annoy index using 1 thread, search_k = 3000
## 11:43:08 Annoy recall = 100%
## 11:43:09 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 11:43:10 Initializing from normalized Laplacian + noise (using RSpectra)
## 11:43:10 Commencing optimization for 500 epochs, with 8456 positive edges
## 11:43:11 Optimization finished
## Warning: No assay specified, setting assay as RNA by default.
colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = Type))+geom_point()+
  scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()

ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = cluster))+geom_point()+
  scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()

res1 <- res[order(res$cluster),]
res2 <- data.frame(clus = res1$cluster)
rownames(res2) <- res1$Cell

rownames(asp_matrix) <- asp_matrix$Cell
asp_mat <- asp_matrix[, setdiff(colnames(asp_matrix), "Cell"), drop = FALSE]
asp_mat <- asp_mat[rownames(res2),]
pheatmap::pheatmap(asp_mat,cluster_rows = F,cluster_cols = F,annotation_row = res2,show_rownames = F,scale = "column")

The ARI and NMI values represent the concordance between the clusters identified by AEnet and the ground truth as reported in the original study.

siz <- NULL
pin <- c(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55)
for(i in pin){
  res <- cell_clus(asp_matrix,exp_matrix,resolution = i,scale = FALSE)
  colnames(ann)[1] <- "Cell"
  ann1 <- merge(ann,res,by = "Cell")

  if (require("mclust")) {
    tmp <- adjustedRandIndex(ann1$Type,ann1$cluster)
  }
  siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9615
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9448
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9280
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9113
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8945
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8778
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8611
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8443
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8276
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 213
## Number of edges: 6979
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8108
## Number of communities: 3
## Elapsed time: 0 seconds
siz
##  [1] 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717 0.8240717
##  [8] 0.8240717 0.8240717 0.8240717
siz <- NULL
pin <- c(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55)
for(i in pin){
  res <- cell_clus(NULL,exp_matrix,resolution = i,scale = FALSE)
  colnames(ann)[1] <- "Cell"
  ann1 <- merge(ann,res,by = "Cell")

  if (require("aricode")) {
    tmp <- NMI(ann1$Type,ann1$cluster)
  }
  siz <- c(siz,tmp)
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9597
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9429
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9261
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9094
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8926
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8758
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8591
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8423
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8256
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 6720
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8088
## Number of communities: 3
## Elapsed time: 0 seconds
siz
##  [1] 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428 0.8082428
##  [8] 0.8082428 0.8082428 0.8082428

The Regulatory

To identify key splicing factors, AEnet integrated the AEN network with a predefined list of splicing factors and a specific set of ASP events. This allowed us to construct a sub-network of ASP-splicing factors, facilitating the exploration of the regulatory relationships between alternative splicing events and splicing factors. The higher the degree strength of the splicing factor, the more it indicated a close relationship with the occurrence of these ASP events. The regulatory direction of splicing factors on the given set of ASPs was determined based on the proportion of positive or negative correlations. Specifically, if 75% or more of the splicing factor–ASP pairs were positive (or negative), the splicing factor was classified as positively (or negatively) regulating the ASP sets. AEnet also identified key pathways associated with different ASP patterns. For each ASP event, gene sets that were positively and negatively correlated with the event were identified based on the AEN network. Functional enrichment analysis was then performed on both the ASP’s gene and the positively (or negatively) correlated gene sets. The pathway to which the ASP gene belongs is considered the most likely pathway affected by the ASP change.

sfname <- sf$V1
sf <- key_sf(corm,sfname,ann_junc,cluster = "C_11")
head(sf)
## # A tibble: 6 × 4
##   SF     importance order direction
##   <chr>       <dbl> <int>     <int>
## 1 ELAVL4       1        1         1
## 2 CELF4        0.88     2         1
## 3 CELF3        0.84     3         1
## 4 ELAVL2       0.84     4         1
## 5 ELAVL3       0.84     5         1
## 6 SNRPD2       0.76     6        -1
sf1 <- sf[1:10,]
sf1$SF <- factor(sf1$SF,levels = as.character(sf1$SF))
ggplot(sf1,aes(x=SF,y=importance))+geom_bar(stat = "identity")+theme_classic()