UCD-BDLab · ramosv · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -22,7 +22,11 @@ repos:
     hooks:
       - id: mypy
         name: Static type checking with MyPy
-        args: [--ignore-missing-imports]
+        exclude: ^bioneuralnet/network/pysmccnet/
+        args: [
+          --ignore-missing-imports,
+          --follow-imports=silent,
+        ]
 
   - repo: local
     hooks:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,48 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/).
 
+## [1.3.0] - 2026-04-01
+
+### Network Module (`bioneuralnet.network`)
+- **New dedicated module**: Network construction and analysis moved from `bioneuralnet.utils` to `bioneuralnet.network`.
+- **Renamed construction functions**: `gen_similarity_graph` -> `similarity_network`, `gen_correlation_graph` -> `correlation_network`, `gen_threshold_graph` -> `threshold_network`, `gen_gaussian_knn_graph` -> `gaussian_knn_network`.
+- **`NetworkAnalyzer`**: Moved to `bioneuralnet.network`; GPU-accelerated via PyTorch; added `hub_analysis`, `cross_omics_analysis`, `edge_weight_analysis`, `find_strongest_edges`, `degree_distribution`, `clustering_coefficient_gpu`, `connected_components`.
+- **`auto_pysmccnet`**: Phenotype-driven network construction via SmCCNet 2.0; supports CCA and PLS modes, now fully implemented in native Python, simplifying user experience and removing the R dependency.
+
+### Utils Module
+- **`impute_omics` / `impute_omics_knn` renamed**: Now `impute_simple` and `impute_knn`.
+- **`normalize_omics` renamed**: Now `normalize`; supports `"standard"`, `"minmax"`, `"log2"`.
+- **`beta_to_m` renamed**: Now `m_transform`.
+- **New `feature_selection` submodule**: `laplacian_score`, `mad_filter`, `pca_loadings`, `correlation_filter`, `importance_rf`, `variance_threshold`, `top_anova_f_features`.
+- **New `data` functions**: `data_stats`, `sparse_filter`, `nan_summary`, `zero_summary`.
+- **`clean_internal`**: New cleaning function with configurable NaN threshold.
+
+### DPMON Enhancements
+- **`tune_trials`**: Already introduced in 1.2.2; now fully documented.
+- **`ae_architecture`**: New parameter; supports `"original"` and `"dynamic"` autoencoder architectures.
+- **`correlation_mode`**: New parameter; supports `"abs_pearson"` (default) and `"adaptive"` node feature computation.
+- **Inner CV tuning**: Ray Tune now performs epoch-synchronized inner k-fold cross-validation across all trials.
+
+### Datasets
+- **PAAD removed** from built-in datasets.
+- **Dataset size reduction**: BRCA, LGG, and KIPAN datasets significantly reduced from ~4,000 omics features per dataset to 700 (400 methylation, 200 mRNA, 100 miRNA) using Laplacian Score filtering, replacing the previous ANOVA-F & Random Forest intersection strategy. This standardization was necessary to stay within the PyPI 100 MB package size limit (v1.2.2 reached 97.9 MB) and results in substantially faster installs and downloads for users.
+
+### Documentation
+- **Data Decision Framework**: New comprehensive stage-by-stage parameter reference (`quick_start/data_framework.rst`).
+- **Quick Start notebooks**: New home for end-to-end `Quick_Start.ipynb` and `quick_start_bio.rst`.
+- **Subgraph page**: Updated case studies from KIPAN to TCGA-LGG and ROSMAP with full algorithm documentation.
+- **`network.rst`**: New dedicated page for the network module.
+- **`utils.rst`**, **`datasets.rst`**, **`index.rst`**, **`subgraph.rst`**: Major updates throughout.
+- **README**: GitHub readme updated to reflect all API changes, new images, and corrected function names.
+
+### Removed
+- `gen_similarity_graph`, `gen_correlation_graph`, `gen_threshold_graph`, `gen_gaussian_knn_graph` from `bioneuralnet.utils`.
+- `graph_analysis`, `repair_graph_connectivity`, `find_optimal_graph` from `bioneuralnet.utils` (superseded by `NetworkAnalyzer` and `network_search`).
+- `impute_omics`, `impute_omics_knn`, `normalize_omics`, `beta_to_m` (renamed, see above).
+
+### Testing
+- Test suite updated to align with new `network` module and renamed utils functions.
+
 ## [1.2.2] - 2025-12-29
 
 ### Documentation

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 [![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue.svg)](https://bioneuralnet.readthedocs.io/en/latest/)
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17503083.svg)](https://doi.org/10.5281/zenodo.17503083)
 
-## Welcome to BioNeuralNet 1.2.2
+## Welcome to BioNeuralNet 1.3.0
 
 ![BioNeuralNet Logo](assets/logo_update.png)
 
@@ -386,4 +386,4 @@ For your convenience, you can use the following BibTeX entry:
       url={https://arxiv.org/abs/2507.20440},
 }
 ```
-</details>
+</details>
diff --git a/assets/logo_update.png b/assets/logo_update.png
diff --git a/assets/logo_update2.png b/assets/logo_update2.png
diff --git a/bioneuralnet/__init__.py b/bioneuralnet/__init__.py
@@ -13,7 +13,7 @@
 
 """
 
-__version__ = "1.2.2"
+__version__ = "1.3.0"
 
 # submodules to enable direct imports such as `from bioneuralnet import utils`
 from . import utils
@@ -51,24 +51,25 @@
 
 __all__ = [
     "__version__",
-    
+
     "utils",
     "metrics",
     "datasets",
     "clustering",
     "network_embedding",
     "downstream_task",
     "network",
+    "external_tools",
 
     "GNNEmbedding",
     "SubjectRepresentation",
     "auto_pysmccnet",
     "DPMON",
-    
+
     "DatasetLoader",
     "CorrelatedPageRank",
     "CorrelatedLouvain",
-    
+
     "HybridLouvain",
 
     "load_example",

diff --git a/bioneuralnet/clustering/__init__.py b/bioneuralnet/clustering/__init__.py
@@ -1,16 +1,16 @@
 r"""Network Clustering and Subgraph Detection.
 
-This module implements hybrid algorithms for identifying phenotype-associated 
-subgraphs in multi-omics networks. It combines global modularity optimization 
+This module implements hybrid algorithms for identifying phenotype-associated
+subgraphs in multi-omics networks. It combines global modularity optimization
 with local random-walk refinement, weighted by phenotypic correlation.
 
 Classes:
-    HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning 
-        (Louvain) and local refinement (PageRank) to find the most significant 
+    HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning
+        (Louvain) and local refinement (PageRank) to find the most significant
         subgraph associated with a phenotype.
     CorrelatedLouvain: Extends standard Louvain by optimizing a hybrid objective:
         Q_hybrid = k_L * Modularity + (1 - k_L) * Correlation.
-    CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to 
+    CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to
         minimize a hybrid conductance objective:
         Phi_hybrid = k_P * Conductance + (1 - k_P) * Correlation.
     Louvain: Standard Louvain community detection (based on modularity maximization).
@@ -27,4 +27,4 @@
     "CorrelatedLouvain",
     "HybridLouvain",
     "Louvain"
-]
+]
diff --git a/bioneuralnet/clustering/correlated_louvain.py b/bioneuralnet/clustering/correlated_louvain.py
@@ -1,41 +1,41 @@
 r"""
 Correlated Louvain Community Detection.
 
-This module extends the standard Louvain algorithm by incorporating an 
-absolute phenotype-correlation objective into the modularity maximization 
+This module extends the standard Louvain algorithm by incorporating an
+absolute phenotype-correlation objective into the modularity maximization
 process.
 
 References:
-    Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in 
-    Multi-omics Networks for Disease Pathway Identification," 
+    Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
+    Multi-omics Networks for Disease Pathway Identification,"
     Frontiers in Big Data.
 
 Notes:
     **Hybrid Modularity Objective**
-    The algorithm optimizes connectivity and phenotype correlation 
+    The algorithm optimizes connectivity and phenotype correlation
     simultaneously using the following weighted objective function:
 
     .. math::
         Q_{hybrid} = k_L Q + (1 - k_L) \rho
 
     Where:
         * :math:`Q`: Standard modularity (internal connectivity).
-        * :math:`\\rho`: Absolute Pearson correlation of the community's 
+        * :math:`\\rho`: Absolute Pearson correlation of the community's
           first principal component (PC1) with phenotype :math:`Y`.
         * :math:`k_L`: User-defined weight on modularity (Suggested: 0.2).
 
 Algorithm:
-    The hierarchical loop and Phase 2 (network aggregation) remain 
-    identical to the standard Louvain method. The modification occurs 
+    The hierarchical loop and Phase 2 (network aggregation) remain
+    identical to the standard Louvain method. The modification occurs
     exclusively in **Phase 1 (Local Optimization)**.
 
-    When evaluating the movement of node :math:`v` from community :math:`D` 
+    When evaluating the movement of node :math:`v` from community :math:`D`
     to community :math:`C`, the gain is calculated as:
 
     .. math::
         \Delta_{hybrid} = k_L \Delta Q + (1 - k_L) \Delta \\rho
 
-    The correlation gain :math:`\Delta \\rho` is defined as the change in 
+    The correlation gain :math:`\Delta \\rho` is defined as the change in
     total correlation across affected communities:
 
     .. math::
@@ -86,10 +86,10 @@ def __init__(
             raise ValueError(f"k_L must be in [0, 1], got {k_L}")
 
         super().__init__(
-            G=G, 
-            weight=weight, 
+            G=G,
+            weight=weight,
             max_passes=max_passes,
-            min_delta=min_delta, 
+            min_delta=min_delta,
             seed=seed,
         )
 
@@ -196,7 +196,7 @@ def _collect_orig(
         s: Set[int] = set()
         for idx in np.where(community == comm_id)[0]:
             s.update(n2o[int(idx)])
-            
+
         return frozenset(s)
 
     def _correlated_phase1(
@@ -227,7 +227,7 @@ def _correlated_phase1(
             for node in order:
                 cur_comm = int(community[node])
                 nbr_idx = np.nonzero(A[node])[0]
-                
+
                 if len(nbr_idx) == 0:
                     continue
 
@@ -383,6 +383,6 @@ def get_top_communities(self, n: int = 1) -> List[Tuple[int, float, List[Any]]]:
                 continue
             idx_set = frozenset(self.node_to_idx[nd] for nd in nds)
             ranked.append((cid, self._pc1_correlation(idx_set), nds))
-            
+
         ranked.sort(key=lambda x: x[1], reverse=True)
-        return ranked[:n]
+        return ranked[:n]
diff --git a/bioneuralnet/clustering/correlated_pagerank.py b/bioneuralnet/clustering/correlated_pagerank.py
@@ -1,17 +1,17 @@
 r"""
 Correlated PageRank Clustering.
 
-This module implements a personalized PageRank algorithm combined with a 
+This module implements a personalized PageRank algorithm combined with a
 phenotype-aware sweep cut to detect significant subgraphs.
 
 References:
-    Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in 
-    Multi-omics Networks for Disease Pathway Identification," 
+    Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
+    Multi-omics Networks for Disease Pathway Identification,"
     Frontiers in Big Data.
 
 Algorithm:
     The PageRank vector is computed as the stationary distribution of:
-    
+
     .. math::
         pr_{\\alpha}(s) = \\alpha s + (1 - \\alpha) pr_{\\alpha}(s) W
 
@@ -21,13 +21,13 @@
         * :math:`W`: Transition matrix.
 
     .. important::
-        The `networkx.pagerank` implementation uses a `alpha` parameter 
-        representing the **damping factor** (link-following probability). 
+        The `networkx.pagerank` implementation uses a `alpha` parameter
+        representing the **damping factor** (link-following probability).
         Therefore, :math:`\\text{nx_alpha} = 1 - \\alpha_{theoretical}`.
 
 Notes:
     **Sweep Cut Optimization**
-    Nodes are sorted by PageRank-per-degree in descending order. For each 
+    Nodes are sorted by PageRank-per-degree in descending order. For each
     prefix set :math:`S_i`, the algorithm minimizes the **Hybrid Conductance**:
 
     .. math::
@@ -39,13 +39,13 @@
         * :math:`k_P`: Trade-off weight (Default: ~0.5).
 
     **Personalization Vector (Seed Weighting)**
-    Teleportation probabilities for seeds are weighted by their marginal 
+    Teleportation probabilities for seeds are weighted by their marginal
     contribution to correlation:
 
     .. math::
         \\alpha_i = \\frac{\\rho_i}{\\max(\\rho_{seeds})} \\cdot \\alpha_{max}
 
-    Where :math:`\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|`. 
+    Where :math:`\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|`.
     Values where :math:`\\rho_i < 0` are clamped to 0.
 """
 
@@ -103,13 +103,13 @@ def __init__(
 
         if not 0.0 <= teleport_prob <= 1.0:
             raise ValueError(f"teleport_prob must be in [0, 1], got {teleport_prob}")
-            
+
         self.teleport_prob = teleport_prob
         self._nx_alpha = 1.0 - teleport_prob
 
         if not 0.0 <= k_P <= 1.0:
             raise ValueError(f"k_P must be in [0, 1], got {k_P}")
-            
+
         self.k_P = k_P
         self.max_iter = max_iter
         self.tol = tol
@@ -135,14 +135,14 @@ def _validate_inputs(self):
         """
         if not isinstance(self.G, nx.Graph):
             raise TypeError("graph must be a networkx.Graph")
-            
+
         if not isinstance(self.B, pd.DataFrame):
             raise TypeError("omics_data must be a pandas DataFrame")
-            
+
         graph_nodes = set(str(n) for n in self.G.nodes())
         omics_cols = set(str(c) for c in self.B.columns)
         missing = graph_nodes - omics_cols
-        
+
         if missing:
             logger.warning(
                 f"{len(missing)} graph nodes missing from omics columns "
@@ -174,7 +174,7 @@ def phen_omics_corr(self, nodes: List[Any]) -> Tuple[float, float]:
             return 0.0, 1.0
 
         B_sub = self.B[valid_cols]
-        
+
         if B_sub.shape[0] < 2:
             return 0.0, 1.0
 
@@ -199,7 +199,7 @@ def phen_omics_corr(self, nodes: List[Any]) -> Tuple[float, float]:
                 pc1, y_vals = pc1[:n_limit], y_vals[:n_limit]
 
             corr, pvalue = pearsonr(pc1, y_vals)
-            
+
             return (float(corr), float(pvalue)) if np.isfinite(corr) else (0.0, 1.0)
 
         except Exception as e:
@@ -255,7 +255,7 @@ def sweep_cut(self, pr_scores: Dict[Any, float]) -> Dict[str, Any]:
 
             vol_S = sum(d for _, d in self.G.degree(current_cluster, weight="weight"))
             vol_T = sum(d for _, d in self.G.degree(complement, weight="weight"))
-            
+
             if min(vol_S, vol_T) == 0:
                 continue
 
@@ -310,7 +310,7 @@ def generate_weighted_personalization(
             if not nodes_excl:
                 contributions.append(0.0)
                 continue
-                
+
             corr_excl, _ = self.phen_omics_corr(nodes_excl)
             rho_i = abs_total - abs(corr_excl)
             contributions.append(rho_i)
@@ -347,12 +347,12 @@ def run(self, seed_nodes: List[Any]) -> Dict[str, Any]:
 
         graph_nodes = set(self.G.nodes())
         missing = set(seed_nodes) - graph_nodes
-        
+
         if missing:
             raise ValueError(f"Seed nodes not in graph: {missing}")
 
         personalization = self.generate_weighted_personalization(seed_nodes)
-        
+
         logger.info(
             f"Personalization: {len(personalization)} nodes, "
             f"max_weight={max(personalization.values()):.4f}, "
@@ -392,4 +392,4 @@ def run(self, seed_nodes: List[Any]) -> Dict[str, Any]:
         else:
             logger.warning("Sweep cut found no valid cluster.")
 
-        return results
+        return results