Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@ repos:
hooks:
- id: mypy
name: Static type checking with MyPy
args: [--ignore-missing-imports]
exclude: ^bioneuralnet/network/pysmccnet/
args: [
--ignore-missing-imports,
--follow-imports=silent,
]

- repo: local
hooks:
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,48 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/).

## [1.3.0] - 2026-04-01

### Network Module (`bioneuralnet.network`)
- **New dedicated module**: Network construction and analysis moved from `bioneuralnet.utils` to `bioneuralnet.network`.
- **Renamed construction functions**: `gen_similarity_graph` -> `similarity_network`, `gen_correlation_graph` -> `correlation_network`, `gen_threshold_graph` -> `threshold_network`, `gen_gaussian_knn_graph` -> `gaussian_knn_network`.
- **`NetworkAnalyzer`**: Moved to `bioneuralnet.network`; GPU-accelerated via PyTorch; added `hub_analysis`, `cross_omics_analysis`, `edge_weight_analysis`, `find_strongest_edges`, `degree_distribution`, `clustering_coefficient_gpu`, `connected_components`.
- **`auto_pysmccnet`**: Phenotype-driven network construction via SmCCNet 2.0; supports CCA and PLS modes, now fully implemented in native Python, simplifying user experience and removing the R dependency.

### Utils Module
- **`impute_omics` / `impute_omics_knn` renamed**: Now `impute_simple` and `impute_knn`.
- **`normalize_omics` renamed**: Now `normalize`; supports `"standard"`, `"minmax"`, `"log2"`.
- **`beta_to_m` renamed**: Now `m_transform`.
- **New `feature_selection` submodule**: `laplacian_score`, `mad_filter`, `pca_loadings`, `correlation_filter`, `importance_rf`, `variance_threshold`, `top_anova_f_features`.
- **New `data` functions**: `data_stats`, `sparse_filter`, `nan_summary`, `zero_summary`.
- **`clean_internal`**: New cleaning function with configurable NaN threshold.

### DPMON Enhancements
- **`tune_trials`**: Already introduced in 1.2.2; now fully documented.
- **`ae_architecture`**: New parameter; supports `"original"` and `"dynamic"` autoencoder architectures.
- **`correlation_mode`**: New parameter; supports `"abs_pearson"` (default) and `"adaptive"` node feature computation.
- **Inner CV tuning**: Ray Tune now performs epoch-synchronized inner k-fold cross-validation across all trials.

### Datasets
- **PAAD removed** from built-in datasets.
- **Dataset size reduction**: BRCA, LGG, and KIPAN datasets significantly reduced from ~4,000 omics features per dataset to 700 (400 methylation, 200 mRNA, 100 miRNA) using Laplacian Score filtering, replacing the previous ANOVA-F & Random Forest intersection strategy. This standardization was necessary to stay within the PyPI 100 MB package size limit (v1.2.2 reached 97.9 MB) and results in substantially faster installs and downloads for users.

### Documentation
- **Data Decision Framework**: New comprehensive stage-by-stage parameter reference (`quick_start/data_framework.rst`).
- **Quick Start notebooks**: New home for end-to-end `Quick_Start.ipynb` and `quick_start_bio.rst`.
- **Subgraph page**: Updated case studies from KIPAN to TCGA-LGG and ROSMAP with full algorithm documentation.
- **`network.rst`**: New dedicated page for the network module.
- **`utils.rst`**, **`datasets.rst`**, **`index.rst`**, **`subgraph.rst`**: Major updates throughout.
- **README**: GitHub readme updated to reflect all API changes, new images, and corrected function names.

### Removed
- `gen_similarity_graph`, `gen_correlation_graph`, `gen_threshold_graph`, `gen_gaussian_knn_graph` from `bioneuralnet.utils`.
- `graph_analysis`, `repair_graph_connectivity`, `find_optimal_graph` from `bioneuralnet.utils` (superseded by `NetworkAnalyzer` and `network_search`).
- `impute_omics`, `impute_omics_knn`, `normalize_omics`, `beta_to_m` (renamed, see above).

### Testing
- Test suite updated to align with new `network` module and renamed utils functions.

## [1.2.2] - 2025-12-29

### Documentation
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue.svg)](https://bioneuralnet.readthedocs.io/en/latest/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17503083.svg)](https://doi.org/10.5281/zenodo.17503083)

## Welcome to BioNeuralNet 1.2.2
## Welcome to BioNeuralNet 1.3.0

![BioNeuralNet Logo](assets/logo_update.png)

Expand Down Expand Up @@ -386,4 +386,4 @@ For your convenience, you can use the following BibTeX entry:
url={https://arxiv.org/abs/2507.20440},
}
```
</details>
</details>
Binary file modified assets/logo_update.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/logo_update2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 5 additions & 4 deletions bioneuralnet/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

"""

__version__ = "1.2.2"
__version__ = "1.3.0"

# submodules to enable direct imports such as `from bioneuralnet import utils`
from . import utils
Expand Down Expand Up @@ -51,24 +51,25 @@

__all__ = [
"__version__",

"utils",
"metrics",
"datasets",
"clustering",
"network_embedding",
"downstream_task",
"network",
"external_tools",

"GNNEmbedding",
"SubjectRepresentation",
"auto_pysmccnet",
"DPMON",

"DatasetLoader",
"CorrelatedPageRank",
"CorrelatedLouvain",

"HybridLouvain",

"load_example",
Expand Down
12 changes: 6 additions & 6 deletions bioneuralnet/clustering/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
r"""Network Clustering and Subgraph Detection.

This module implements hybrid algorithms for identifying phenotype-associated
subgraphs in multi-omics networks. It combines global modularity optimization
This module implements hybrid algorithms for identifying phenotype-associated
subgraphs in multi-omics networks. It combines global modularity optimization
with local random-walk refinement, weighted by phenotypic correlation.

Classes:
HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning
(Louvain) and local refinement (PageRank) to find the most significant
HybridLouvain: The primary pipeline. Iteratively alternates between global partitioning
(Louvain) and local refinement (PageRank) to find the most significant
subgraph associated with a phenotype.
CorrelatedLouvain: Extends standard Louvain by optimizing a hybrid objective:
Q_hybrid = k_L * Modularity + (1 - k_L) * Correlation.
CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to
CorrelatedPageRank: Performs a biased random walk (PageRank) followed by a sweep cut to
minimize a hybrid conductance objective:
Phi_hybrid = k_P * Conductance + (1 - k_P) * Correlation.
Louvain: Standard Louvain community detection (based on modularity maximization).
Expand All @@ -27,4 +27,4 @@
"CorrelatedLouvain",
"HybridLouvain",
"Louvain"
]
]
34 changes: 17 additions & 17 deletions bioneuralnet/clustering/correlated_louvain.py
Original file line number Diff line number Diff line change
@@ -1,41 +1,41 @@
r"""
Correlated Louvain Community Detection.

This module extends the standard Louvain algorithm by incorporating an
absolute phenotype-correlation objective into the modularity maximization
This module extends the standard Louvain algorithm by incorporating an
absolute phenotype-correlation objective into the modularity maximization
process.

References:
Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
Multi-omics Networks for Disease Pathway Identification,"
Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
Multi-omics Networks for Disease Pathway Identification,"
Frontiers in Big Data.

Notes:
**Hybrid Modularity Objective**
The algorithm optimizes connectivity and phenotype correlation
The algorithm optimizes connectivity and phenotype correlation
simultaneously using the following weighted objective function:

.. math::
Q_{hybrid} = k_L Q + (1 - k_L) \rho

Where:
* :math:`Q`: Standard modularity (internal connectivity).
* :math:`\\rho`: Absolute Pearson correlation of the community's
* :math:`\\rho`: Absolute Pearson correlation of the community's
first principal component (PC1) with phenotype :math:`Y`.
* :math:`k_L`: User-defined weight on modularity (Suggested: 0.2).

Algorithm:
The hierarchical loop and Phase 2 (network aggregation) remain
identical to the standard Louvain method. The modification occurs
The hierarchical loop and Phase 2 (network aggregation) remain
identical to the standard Louvain method. The modification occurs
exclusively in **Phase 1 (Local Optimization)**.

When evaluating the movement of node :math:`v` from community :math:`D`
When evaluating the movement of node :math:`v` from community :math:`D`
to community :math:`C`, the gain is calculated as:

.. math::
\Delta_{hybrid} = k_L \Delta Q + (1 - k_L) \Delta \\rho

The correlation gain :math:`\Delta \\rho` is defined as the change in
The correlation gain :math:`\Delta \\rho` is defined as the change in
total correlation across affected communities:

.. math::
Expand Down Expand Up @@ -86,10 +86,10 @@ def __init__(
raise ValueError(f"k_L must be in [0, 1], got {k_L}")

super().__init__(
G=G,
weight=weight,
G=G,
weight=weight,
max_passes=max_passes,
min_delta=min_delta,
min_delta=min_delta,
seed=seed,
)

Expand Down Expand Up @@ -196,7 +196,7 @@ def _collect_orig(
s: Set[int] = set()
for idx in np.where(community == comm_id)[0]:
s.update(n2o[int(idx)])

return frozenset(s)

def _correlated_phase1(
Expand Down Expand Up @@ -227,7 +227,7 @@ def _correlated_phase1(
for node in order:
cur_comm = int(community[node])
nbr_idx = np.nonzero(A[node])[0]

if len(nbr_idx) == 0:
continue

Expand Down Expand Up @@ -383,6 +383,6 @@ def get_top_communities(self, n: int = 1) -> List[Tuple[int, float, List[Any]]]:
continue
idx_set = frozenset(self.node_to_idx[nd] for nd in nds)
ranked.append((cid, self._pc1_correlation(idx_set), nds))

ranked.sort(key=lambda x: x[1], reverse=True)
return ranked[:n]
return ranked[:n]
42 changes: 21 additions & 21 deletions bioneuralnet/clustering/correlated_pagerank.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
r"""
Correlated PageRank Clustering.

This module implements a personalized PageRank algorithm combined with a
This module implements a personalized PageRank algorithm combined with a
phenotype-aware sweep cut to detect significant subgraphs.

References:
Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
Multi-omics Networks for Disease Pathway Identification,"
Abdel-Hafiz et al. (2022), "Significant Subgraph Detection in
Multi-omics Networks for Disease Pathway Identification,"
Frontiers in Big Data.

Algorithm:
The PageRank vector is computed as the stationary distribution of:

.. math::
pr_{\\alpha}(s) = \\alpha s + (1 - \\alpha) pr_{\\alpha}(s) W

Expand All @@ -21,13 +21,13 @@
* :math:`W`: Transition matrix.

.. important::
The `networkx.pagerank` implementation uses a `alpha` parameter
representing the **damping factor** (link-following probability).
The `networkx.pagerank` implementation uses a `alpha` parameter
representing the **damping factor** (link-following probability).
Therefore, :math:`\\text{nx_alpha} = 1 - \\alpha_{theoretical}`.

Notes:
**Sweep Cut Optimization**
Nodes are sorted by PageRank-per-degree in descending order. For each
Nodes are sorted by PageRank-per-degree in descending order. For each
prefix set :math:`S_i`, the algorithm minimizes the **Hybrid Conductance**:

.. math::
Expand All @@ -39,13 +39,13 @@
* :math:`k_P`: Trade-off weight (Default: ~0.5).

**Personalization Vector (Seed Weighting)**
Teleportation probabilities for seeds are weighted by their marginal
Teleportation probabilities for seeds are weighted by their marginal
contribution to correlation:

.. math::
\\alpha_i = \\frac{\\rho_i}{\\max(\\rho_{seeds})} \\cdot \\alpha_{max}

Where :math:`\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|`.
Where :math:`\\rho_i = |\\rho(S)| - |\\rho(S \setminus \{i\})|`.
Values where :math:`\\rho_i < 0` are clamped to 0.
"""

Expand Down Expand Up @@ -103,13 +103,13 @@ def __init__(

if not 0.0 <= teleport_prob <= 1.0:
raise ValueError(f"teleport_prob must be in [0, 1], got {teleport_prob}")

self.teleport_prob = teleport_prob
self._nx_alpha = 1.0 - teleport_prob

if not 0.0 <= k_P <= 1.0:
raise ValueError(f"k_P must be in [0, 1], got {k_P}")

self.k_P = k_P
self.max_iter = max_iter
self.tol = tol
Expand All @@ -135,14 +135,14 @@ def _validate_inputs(self):
"""
if not isinstance(self.G, nx.Graph):
raise TypeError("graph must be a networkx.Graph")

if not isinstance(self.B, pd.DataFrame):
raise TypeError("omics_data must be a pandas DataFrame")

graph_nodes = set(str(n) for n in self.G.nodes())
omics_cols = set(str(c) for c in self.B.columns)
missing = graph_nodes - omics_cols

if missing:
logger.warning(
f"{len(missing)} graph nodes missing from omics columns "
Expand Down Expand Up @@ -174,7 +174,7 @@ def phen_omics_corr(self, nodes: List[Any]) -> Tuple[float, float]:
return 0.0, 1.0

B_sub = self.B[valid_cols]

if B_sub.shape[0] < 2:
return 0.0, 1.0

Expand All @@ -199,7 +199,7 @@ def phen_omics_corr(self, nodes: List[Any]) -> Tuple[float, float]:
pc1, y_vals = pc1[:n_limit], y_vals[:n_limit]

corr, pvalue = pearsonr(pc1, y_vals)

return (float(corr), float(pvalue)) if np.isfinite(corr) else (0.0, 1.0)

except Exception as e:
Expand Down Expand Up @@ -255,7 +255,7 @@ def sweep_cut(self, pr_scores: Dict[Any, float]) -> Dict[str, Any]:

vol_S = sum(d for _, d in self.G.degree(current_cluster, weight="weight"))
vol_T = sum(d for _, d in self.G.degree(complement, weight="weight"))

if min(vol_S, vol_T) == 0:
continue

Expand Down Expand Up @@ -310,7 +310,7 @@ def generate_weighted_personalization(
if not nodes_excl:
contributions.append(0.0)
continue

corr_excl, _ = self.phen_omics_corr(nodes_excl)
rho_i = abs_total - abs(corr_excl)
contributions.append(rho_i)
Expand Down Expand Up @@ -347,12 +347,12 @@ def run(self, seed_nodes: List[Any]) -> Dict[str, Any]:

graph_nodes = set(self.G.nodes())
missing = set(seed_nodes) - graph_nodes

if missing:
raise ValueError(f"Seed nodes not in graph: {missing}")

personalization = self.generate_weighted_personalization(seed_nodes)

logger.info(
f"Personalization: {len(personalization)} nodes, "
f"max_weight={max(personalization.values()):.4f}, "
Expand Down Expand Up @@ -392,4 +392,4 @@ def run(self, seed_nodes: List[Any]) -> Dict[str, Any]:
else:
logger.warning("Sweep cut found no valid cluster.")

return results
return results
Loading
Loading