T cells datasets

Preparation

The AEnet package can be installed using the following command: devtools::install_github(“liushang17/AEnet”)
This tutorial demonstrates the application of AEnet for single-cell alternative splicing (AS) clustering and regulatory mechanism prediction using full-length single-cell RNA-sequencing methods, such as Smart-seq2. The dataset used to illustrate the utility of AEnet comprises T cells with different status, such as naive, effector, memory, exhaustion, and Treg (Zhang et al., 2018).

This demonstration is based on the T cells dataset, which can be downloaded from the following link: [https://drive.google.com/file/d/1OAPWcSb4CjIRewEAl88egrvmmKlT1vKj/view?usp=drive_link]

Load Package

library(Seurat)
library(AEN)
library(ggsci)
library(ggrepel)
library(ggpubr)
col <- c(pal_npg("nrc")(10)[1:6],"#FAFD7CFF","#FF6F00FF",pal_lancet("lanonc")(9)[c(1,3,7)],"#660099FF","#B5CF6BFF","#B24745FF","#CCFF00FF",
         "#FFCD00FF","#800000FF","#20854EFF","#616530FF","#FF410DFF","#EE4C97FF","#FF1463FF","#00FF00FF","#990080FF","#00FFFFFF",
         "#666666FF","#CC33FFFF","#00D68FFF","#4775FFFF","#C5B0D5FF","#FDAE6BFF","#79CC3DFF","#996600FF","#FFCCCCFF","#0000CCFF",
         "#7A65A5FF","#1A5354FF","#24325FFF")

Input

We will load the required input files for AEnet and describe the formatting requirements for each file. ### Sample Metadata This is a user-generated tab-delimited file in which each row corresponds to a cell ID, and each column represents associated cell metadata, such as cell type or patient identifier. The only required column is ‘cell’,‘Type’,and ‘Patient’; all other columns are optional.

indir <- "/Users/shangliu/01.terms/02.AEN/04.result/CRC/input/"
outdir <- "/Users/shangliu/01.terms/02.AEN/04.result/CRC/result/"
ann <- read.table(paste0(indir,"/metadata.xls"),sep = "\t",header = T) # The cell metadata

ann <- ann[,c("allcell","cluster","Patient_ID")]
colnames(ann) <- c("cell","Type","Patient")
head(ann)

##                      cell          Type Patient
## 1  PD10_ZZM_NP71.20180123 CD4_C05.CXCR6   P0123
## 2 PD10_ZZM_NP710.20180123  CD4_C11.IL10   P0123
## 3 PD10_ZZM_NP711.20180123    diverse.DN   P0123
## 4 PD10_ZZM_NP712.20180123 diverse.other   P0123
## 5 PD10_ZZM_NP713.20180123    diverse.DN   P0123
## 6 PD10_ZZM_NP714.20180123 CD4_C05.CXCR6   P0123

Splice junction counts matrix

The rows of this matrix represent splice junction coordinates, the columns correspond to cell IDs, and the values denote splice junction counts. This junction count matrix can be generated using the DESJ-detection (Liu et al., 2021) tool (https://github.com/liushang17/DESJ-detection), with the *SJ.out.tab files from STAR (v2.5.3, Dobin et al., 2013) as input. Due to the large size of the T cell dataset, we present only a subset of junctions from the Cell-Junction Count Matrix.

mat <- readRDS(paste0(indir,"/ASP.subset.rds")) # Cell-junction Count Matrix. The input should be ASP.rds
mat[1:5,1:5]

##                            PD10_ZZM_0701_PTR.1_36 PD10_ZZM_0701_PTR.1_37
## chr1_116760633_116764483_1               62.65773               90.20379
## chr1_116760633_116768463_1                0.00000                0.00000
## chr1_1203969_1204033_2                    0.00000                0.00000
## chr1_1203969_1204054_2                    0.00000                0.00000
## chr1_1212139_1212637_2                    0.00000               19.18361
##                            PD10_ZZM_0701_PTR.1_38 PD10_ZZM_0701_PTR.1_42
## chr1_116760633_116764483_1               62.50081               75.11480
## chr1_116760633_116768463_1                0.00000                0.00000
## chr1_1203969_1204033_2                    0.00000                0.00000
## chr1_1203969_1204054_2                    0.00000               22.95174
## chr1_1212139_1212637_2                    0.00000                0.00000
##                            PD10_ZZM_0701_PTR.1_45
## chr1_116760633_116764483_1               192.3182
## chr1_116760633_116768463_1                 0.0000
## chr1_1203969_1204033_2                     0.0000
## chr1_1203969_1204054_2                     0.0000
## chr1_1212139_1212637_2                     0.0000

Splicing event metadata

  The rows of this metadata file represent splice junction coordinates, while the columns include cell IDs, gene IDs (second column), and gene symbols (third column). The metadata can be generated using the DESJ-detection tool (Liu et al., 2021) available at https://github.com/liushang17/DESJ-detection.

aspann <- read.table(paste0(indir,"/junction_anno.xls"),sep = "\t") # The junction 
head(ann)

##                      cell          Type Patient
## 1  PD10_ZZM_NP71.20180123 CD4_C05.CXCR6   P0123
## 2 PD10_ZZM_NP710.20180123  CD4_C11.IL10   P0123
## 3 PD10_ZZM_NP711.20180123    diverse.DN   P0123
## 4 PD10_ZZM_NP712.20180123 diverse.other   P0123
## 5 PD10_ZZM_NP713.20180123    diverse.DN   P0123
## 6 PD10_ZZM_NP714.20180123 CD4_C05.CXCR6   P0123

Gene expression matrix

The rows of this matrix represent gene symbol, the columns correspond to cell IDs, and the values indicate normalized gene expression levels. Due to the large size of the T cell dataset, we present only a subset of genes from the Cell-gene Expression Matrix.

exp <- readRDS(paste0(indir,"/exp.subset.rds")) # Cell-Gene Expression Matrix.  The input should be exp.rds
exp[1:5,1:5]

##           PD10_ZZM_NP71.20180123 PD10_ZZM_NP710.20180123
## 10004                     0.0000                  0.0000
## 100048912                 0.0000                  0.0000
## 100049076                 0.0000                  0.0000
## 10010                   184.6949                194.0816
## 100101467                 0.0000                  0.0000
##           PD10_ZZM_NP714.20180123 PD10_ZZM_NP718.20180123
## 10004                         0.0                4.181126
## 100048912                     0.0                0.000000
## 100049076                     0.0                0.000000
## 10010                      2255.6                2.841733
## 100101467                     0.0                0.000000
##           PD10_ZZM_NP72.20180123
## 10004                      0.000
## 100048912                  0.000
## 100049076                  0.000
## 10010                    355.339
## 100101467                  0.000

Splicing Factor list

A predefined list of splicing factors curated from Seiler et al. (2018).

sf <- read.table(paste0(indir,"/splicing.factor.v2.txt"),sep = "\t") # The SF list
head(sf)

##       V1
## 1  ACIN1
## 2  AGGF1
## 3 ALYREF
## 4    AQR
## 5 ARGLU1
## 6   BAG2

ASP detection

Exon–exon junctions sharing the same start or end coordinates were defined as alternative splicing patterns (ASPs), which can be identified using AEnet. Existing tools such as BRIE, Outrigger, and MARVEL rely heavily on transcriptome annotations, limiting their ability to detect unannotated alternative splicing (AS) events. Moreover, these methods exhibit limited sensitivity in identifying mutually exclusive exons (MSE). In contrast, AEnet enables the detection of a wide spectrum of splicing events—including exon skipping (ES), mutually exclusive exons (MXE), alternative 5′ splice sites (A5SS), alternative 3′ splice sites (A3SS), alternative last exons (ALE), alternative first exons (AFE), MSE, and other unannotated AS events. The output consists of three columns: the ASP event, the first junction of the ASP event, and the corresponding alternative junction.

annj <- asp(mat, outdir,n_cores=2) ##### ASP detection
head(annj)

##                                                      all
## 1: chr1_116760633_116764483_1.chr1_116760633_116768463_1
## 2:         chr1_1203969_1204033_2.chr1_1203969_1204054_2
## 3:         chr1_1212139_1212637_2.chr1_1212139_1212991_2
## 4:     chr1_12202172_12202506_1.chr1_12202172_12206736_1
## 5:     chr1_12202172_12202506_1.chr1_12202172_12206739_1
## 6:     chr1_12202172_12206736_1.chr1_12202172_12206739_1
##                     junction1                  junction2
## 1: chr1_116760633_116764483_1 chr1_116760633_116768463_1
## 2:     chr1_1203969_1204033_2     chr1_1203969_1204054_2
## 3:     chr1_1212139_1212637_2     chr1_1212139_1212991_2
## 4:   chr1_12202172_12202506_1   chr1_12202172_12206736_1
## 5:   chr1_12202172_12202506_1   chr1_12202172_12206739_1
## 6:   chr1_12202172_12206736_1   chr1_12202172_12206739_1

PSI calculation

The percent spliced-in (PSI) for a specific AS event is defined as the proportion of junction reads curated from all detected junctions that span the same site. Notably, unlike expression values, which are always nonnegative integers, PSI is assigned as NaN (resulting from division by 0) when no junction reads are detected at a presupposed AS site in a cell.

psi_matrix <- calculate_psi(mat,annj,outdir) ##### PSI calculation

psi_matrix <- readRDS(paste0(outdir,"/step1.psi.rds"))
psi_matrix[1:5,1:5]

##                                                       PD10_ZZM_0701_PTR.1_36
## chr1_100121550_100122379_2.chr1_100121550_100123209_2                     NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2                     NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2                     NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1                     NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1                     NA
##                                                       PD10_ZZM_0701_PTR.1_37
## chr1_100121550_100122379_2.chr1_100121550_100123209_2                     NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2                     NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2                     NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1                     NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1                     NA
##                                                       PD10_ZZM_0701_PTR.1_38
## chr1_100121550_100122379_2.chr1_100121550_100123209_2                     NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2                     NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2                     NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1                     NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1                     NA
##                                                       PD10_ZZM_0701_PTR.1_42
## chr1_100121550_100122379_2.chr1_100121550_100123209_2                     NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2                     NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2                     NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1                     NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1                     NA
##                                                       PD10_ZZM_0701_PTR.1_45
## chr1_100121550_100122379_2.chr1_100121550_100123209_2                     NA
## chr1_100122485_100123209_2.chr1_100122485_100125881_2                     NA
## chr1_100125943_100126651_2.chr1_100125943_100132749_2                     NA
## chr1_100136929_100137018_1.chr1_100136929_100137023_1                     NA
## chr1_100139712_100140181_1.chr1_100139712_100140407_1                     NA

ASP-EXP Network

To mitigate this effect, for each ASP and each gene, we limit the calculation of the correlation (Spearman by default) to cells with valid PSI values and the gene’s expression level, retaining only those ASP-expression correlation links (referred to as ASP-EXP links) with significant P values. These statistically significant links indicate potential relationships between gene expression dynamics and the usage preference of specific ASPs across the cells. When multiple scRNA-seq samples are available under any experimental conditions, we retain only ASP-EXP links that share the same correlation trend and appear at a moderately higher frequency (2 by default) to exclude batch effect-induced artifacts.

pos <- which(colnames(exp) %in% colnames(psi_matrix))
exp <- exp[,pos]
multi(as.matrix(psi_matrix),as.matrix(exp),ann,outdir,parallel = T,inner_cores = 20)

corm <- process_cor_results(outdir) ####### final AEN network
saveRDS(corm,file = paste0(outdir,"step3.network.rds"))

corm <- readRDS(paste0(outdir,"step3.network.rds"))

ASP & Gene Clusters

To prioritize splicing events with broad regulatory influence, AEnet ranks ASPs or Genes based on the number of associated ASP–EXP links. The top 1,500 ASPs and Genes (by default) are selected as anchor ASPs and Genes. Using hierarchical clustering methods, we divided the ASPs into ASP clusters according to their Jaccard similarity, ensuring that ASPs in the same clusters were associated with similar gene expression sets.

corm2 <- asp_selection(corm,sample_cutoff = 2,link_cutoff = 15) ######### key ASP filter
junc_list <- junction_clustering(corm2) ####### ASP cluster
saveRDS(junc_list,file = paste0(outdir,"step4.asp.cluster.rds"))

junc_list <- readRDS(paste0(outdir,"step4.asp.cluster.rds"))
ann_junc <- junc_list$asp_clusters
head(ann_junc)

##                                                       cluster
## chr1_116760633_116764483_1.chr1_116760633_116768463_1       1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2               1
## chr1_1212139_1212637_2.chr1_1212139_1212991_2               1
## chr1_153543924_153544653_2.chr1_153543940_153544653_2       1
## chr10_62000322_62050887_1.chr10_62049484_62050887_1         1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2         1
##                                                                                                      symbol
## chr1_116760633_116764483_1.chr1_116760633_116768463_1 chr1_116760633_116764483_1.chr1_116760633_116768463_1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2                 chr1_1203969_1204033_2.chr1_1203969_1204054_2
## chr1_1212139_1212637_2.chr1_1212139_1212991_2                 chr1_1212139_1212637_2.chr1_1212139_1212991_2
## chr1_153543924_153544653_2.chr1_153543940_153544653_2 chr1_153543924_153544653_2.chr1_153543940_153544653_2
## chr10_62000322_62050887_1.chr10_62049484_62050887_1     chr10_62000322_62050887_1.chr10_62049484_62050887_1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2     chr10_72072911_72088139_2.chr10_72086993_72088139_2
##                                                       final_cluster  PC
## chr1_116760633_116764483_1.chr1_116760633_116768463_1             1 C_1
## chr1_1203969_1204033_2.chr1_1203969_1204054_2                     1 C_1
## chr1_1212139_1212637_2.chr1_1212139_1212991_2                     1 C_1
## chr1_153543924_153544653_2.chr1_153543940_153544653_2             1 C_1
## chr10_62000322_62050887_1.chr10_62049484_62050887_1               1 C_1
## chr10_72072911_72088139_2.chr10_72086993_72088139_2               1 C_1

corm2 <- gene_selection(corm,aspann,sample_cutoff = 2,link_cutoff = 30) ######### key Gene filter
exp_list <- gene_clustering(corm2) ############ Gene Cluster
saveRDS(exp_list,file = paste0(outdir,"step4.gene.cluster.rds"))

exp_list <- readRDS(paste0(outdir,"step4.gene.cluster.rds"))
exp_junc <- exp_list$asp_clusters
head(exp_junc)

##        cluster symbol final_cluster    PC
## 26789        1  26789             1 Other
## 112487       1 112487             1 Other
## 548593       1 548593             1 Other
## 64396        1  64396             1 Other
## 9304         1   9304             1 Other
## 339745       1 339745             1 Other

Cell Clustering

To further understand the cell heterogeneity at the level of alternative splicing, we performed cell clustering analysis based on alternative splicing and gene expression. Specifically, we first calculated the enrichment scores of each ASP cluster and each gene cluster for individual cells, resulting in a cell-by-cluster enrichment score matrix. Next, the cell-ASP and Gene cluster enrichment score matrix was normalized using the scale function. Finally, the normalized matrix was used as input for the FindNeighbors and FindClusters functions of Seurat to detect cell clusters. Notably, cell clustering can be performed based on either cell–ASP cluster enrichment score matrix or cell–gene cluster matrix alone. Finally, cell subpopulations were delineated using a dual-modality approach that integrates alternative splicing patterns (ASPs) with gene expression data.

clus <- sort(unique(as.character(ann$Type)))[1:19]
pos <- which(ann$Type %in% clus)
ann <- ann[pos,]

asp_matrix <- asp_score(psi_matrix,ann_junc) ######## enrichment score for ASP
head(asp_matrix)

##                     Cell       C_1         C_4       C_6       C_7       C_8
## 1 PD10_ZZM_0701_PTR.1_36 0.7651617 0.019556025 0.3231888 0.4404289 0.6459422
## 2 PD10_ZZM_0701_PTR.1_37 0.9311400 0.007646356 0.2563217 0.4400661 0.6654201
## 3 PD10_ZZM_0701_PTR.1_38 0.6968085 0.146969060 0.1524836 0.5055555 0.6230515
## 4 PD10_ZZM_0701_PTR.1_42 0.8457279 0.130525362 0.4014901 0.5183716 0.7063305
## 5 PD10_ZZM_0701_PTR.1_45 0.8727653 0.090909091 0.3689828 0.4513161 0.6019509
## 6 PD10_ZZM_0701_PTR.1_46 0.8750000 0.012548015 0.3336098 0.4143742 0.5513801
##        C_10       C_9        C_2       C_3       C_5
## 1 0.2631260 0.6976085 0.41160264 0.3614232 0.6534235
## 2 0.3126231 0.4800638 0.06976744 1.0000000 0.6107582
## 3 0.3874257 0.6445968 0.44499724 0.7413333 0.4285714
## 4 0.3533343 0.3631858 0.39480519 0.5000000 0.9840909
## 5 0.2433251 0.7338663 0.49479167 1.0000000 0.4297924
## 6 0.4403106 0.4973281 0.10736687 1.0000000 0.5560714

exp_matrix <- gene_score(as.matrix(exp),exp_junc) ########### enrichment score for Gene
head(exp_matrix)

##                      Cell     C_10       C_14     C_15     C_16      C_17
## 1  PD10_ZZM_NP71.20180123 7.712374 6.12770137 7.754568 5.638499 11.899828
## 2 PD10_ZZM_NP710.20180123 6.079070 3.28384560 8.262383 6.649588  7.846693
## 3 PD10_ZZM_NP714.20180123 7.416429 2.25913840 7.709099 5.613175 10.713769
## 4 PD10_ZZM_NP718.20180123 7.810252 0.07578274 8.339096 6.589445 11.585649
## 5  PD10_ZZM_NP72.20180123 7.552827 0.58720696 7.661474 7.052814 10.327933
## 6 PD10_ZZM_NP720.20180123 6.254306 6.29985971 8.588824 5.070571 11.799051
##        C_18       C_1      C_2       C_3      C_4       C_5      C_6      C_7
## 1 10.369670 11.008499 9.798153 0.1467817 5.808906 0.7292171 0.000000 6.997390
## 2  9.834946 10.004460 9.252286 3.9718132 4.200238 0.4970553 5.017091 7.769986
## 3 10.463140 10.541373 9.720342 0.0000000 5.517147 0.2047265 9.111754 7.143793
## 4  9.942566 11.092271 9.529693 0.0000000 1.893927 0.3014309 0.000000 7.689511
## 5  9.917747  9.794929 9.551042 0.1605858 5.618446 0.0000000 2.105425 7.574028
## 6 10.365822 10.552429 9.306200 0.1104655 4.849603 0.4742963 0.000000 7.193568
##        C_8      C_9     C_11     C_12      C_13
## 1 7.809239 10.72156 7.262900 6.744839  9.028308
## 2 8.284007 10.65389 8.391313 8.922366  7.636574
## 3 8.395090 10.80107 8.310383 6.136417 10.415384
## 4 8.755691 10.43942 8.232970 7.419110  7.615616
## 5 7.352634 10.40811 8.851015 6.937945  7.019071
## 6 7.309897 10.90207 8.079903 5.409385  7.918600

siz <- NULL
for(i in 2:ncol(asp_matrix)){
  tmp <- as.numeric(as.character(as.matrix(asp_matrix[,i])))
  pos <- which(tmp %in% NaN)
  siz <- c(siz,pos)
}
siz <- unique(siz)
asp_matrix <- asp_matrix[-siz,]

pos <- which(asp_matrix$Cell %in% ann$cell)
asp_matrix <- asp_matrix[pos,]
pos <- which(exp_matrix$Cell %in% ann$cell)
exp_matrix <- exp_matrix[pos,]

pos <- which(asp_matrix$Cell %in% exp_matrix$Cell)
asp_matrix <- asp_matrix[pos,]
pos <- which(exp_matrix$Cell %in% asp_matrix$Cell)
exp_matrix <- exp_matrix[pos,]

res <- cell_clus(asp_matrix,exp_matrix, resolution = 0.5,min.dist = 0.1) ########### Cell clustering

## Computing nearest neighbor graph

## Computing SNN

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 221701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8560
## Number of communities: 6
## Elapsed time: 0 seconds

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

## 12:50:54 UMAP embedding parameters a = 1.577 b = 0.8951

## 12:50:54 Read 8216 rows and found 28 numeric columns

## 12:50:54 Using Annoy for neighbor search, n_neighbors = 30

## 12:50:54 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 12:50:55 Writing NN index file to temp file /var/folders/qv/jdllzk4j5b172s6xdvsvnxlw0000gn/T//RtmpI74v2e/file18051fdcf854
## 12:50:55 Searching Annoy index using 1 thread, search_k = 3000
## 12:50:57 Annoy recall = 100%
## 12:50:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 12:50:59 Initializing from normalized Laplacian + noise (using RSpectra)
## 12:50:59 Commencing optimization for 500 epochs, with 352608 positive edges
## 12:51:14 Optimization finished

## Warning: No assay specified, setting assay as RNA by default.

colnames(ann)[1] <- "Cell"
ann1 <- merge(ann,res,by = "Cell")
ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = Type))+geom_point()+
  scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()

ggplot(ann1,aes(x=as.numeric(umap_1),y=as.numeric(umap_2),color = cluster))+geom_point()+
  scale_color_manual(values = col) + labs(x="UMAP_1",y="UMAP_2") + theme_classic()

res1 <- res[order(res$cluster),]
res2 <- data.frame(clus = res1$cluster)
rownames(res2) <- res1$Cell

rownames(asp_matrix) <- asp_matrix$Cell
asp_mat <- asp_matrix[, setdiff(colnames(asp_matrix), "Cell"), drop = FALSE]
asp_mat <- asp_mat[rownames(res2),]
pheatmap::pheatmap(asp_mat,cluster_rows = F,cluster_cols = F,annotation_row = res2,show_rownames = F,scale = "column")

The ARI and NMI values represent the concordance between the clusters identified by AEnet and the ground truth as reported in the original study.

siz <- NULL
pin <- c(0.9,0.95,1,1.05,1.1,1.15,1.2,1.25,1.3,1.35)
for(i in pin){
  res <- cell_clus(asp_matrix,exp_matrix,resolution = i,scale = FALSE)
  colnames(ann)[1] <- "Cell"
  ann1 <- merge(ann,res,by = "Cell")

  if (require("mclust")) {
    tmp <- adjustedRandIndex(ann1$Type,ann1$cluster)
  }
  siz <- c(siz,tmp)
}

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7702
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7628
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7554
## Number of communities: 7
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7480
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7415
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7349
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7282
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7216
## Number of communities: 8
## Elapsed time: 1 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7148
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 279224
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7081
## Number of communities: 8
## Elapsed time: 0 seconds

siz

##  [1] 0.3941760 0.3943857 0.4005765 0.4164451 0.4178772 0.4194442 0.4215663
##  [8] 0.4196046 0.4203675 0.4216739

siz <- NULL
pin <- c(0.9,0.95,1,1.05,1.1,1.15,1.2,1.25,1.3,1.35)
for(i in pin){
  res <- cell_clus(NULL,exp_matrix,resolution = i,scale = FALSE)
  colnames(ann)[1] <- "Cell"
  ann1 <- merge(ann,res,by = "Cell")

  if (require("aricode")) {
    tmp <- NMI(ann1$Type,ann1$cluster)
  }
  siz <- c(siz,tmp)
}

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7876
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7811
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7754
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7697
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7640
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7581
## Number of communities: 11
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7524
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7472
## Number of communities: 13
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7415
## Number of communities: 14
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8216
## Number of edges: 289573
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7371
## Number of communities: 14
## Elapsed time: 0 seconds

siz

##  [1] 0.4739873 0.4690200 0.4701775 0.4700163 0.4684516 0.4711230 0.4729448
##  [8] 0.4752702 0.4799668 0.4787811

The Regulatory

To identify key splicing factors, AEnet integrated the AEN network with a predefined list of splicing factors and a specific set of ASP events. This allowed us to construct a sub-network of ASP-splicing factors, facilitating the exploration of the regulatory relationships between alternative splicing events and splicing factors. The higher the degree strength of the splicing factor, the more it indicated a close relationship with the occurrence of these ASP events. The regulatory direction of splicing factors on the given set of ASPs was determined based on the proportion of positive or negative correlations. Specifically, if 75% or more of the splicing factor–ASP pairs were positive (or negative), the splicing factor was classified as positively (or negatively) regulating the ASP sets. AEnet also identified key pathways associated with different ASP patterns. For each ASP event, gene sets that were positively and negatively correlated with the event were identified based on the AEN network. Functional enrichment analysis was then performed on both the ASP’s gene and the positively (or negatively) correlated gene sets. The pathway to which the ASP gene belongs is considered the most likely pathway affected by the ASP change.

sfname <- sf$V1
geneinfo <- read.table(paste0(indir,"/Gene.info"),sep = "\t",header = T)

pos <- which(geneinfo$symbol %in% sfname)
sfid <- geneinfo$geneID[pos]

sf <- key_sf(corm,as.character(sfid),ann_junc,cluster = "C_3")

sf1 <- sf[1:10,]
colnames(sf1)[1] <- c("geneID")
sf2 <- merge(sf1,geneinfo,by = "geneID")
sf2 <- sf2[order(-sf2$importance),]
sf2$symbol <- factor(sf2$symbol,levels = as.character(sf2$symbol))

ggplot(sf2,aes(x=symbol,y=importance))+geom_bar(stat = "identity")+theme_classic()