Tutorial: Plugging User-Designed Methods into DANCE 2.0 for Auto-Search
In this notebook, we’ll walk through how to integrate a new algorithm (specifically an SVM classifier) into the auto-search framework outlined in your documentation. We will:
Inherit from the BaseClassificationMethod (or another suitable base) to define our custom method. Implement the required interfaces (
fit,predict, and optionallypreprocessing_pipeline).Show how to run the hyperparameter search using the integrated method.
Provide an example
main.py-like script that demonstrates how the auto-search process is orchestrated.
1. Folder Structure & Requirements
Before diving in, ensure you have the following directory structure (at least conceptually; your actual project structure can be more extensive):
examples/tuning/
└── classification_svm/
├── main.py
├── tutorial.ipynb
└── dataset_name/
├── pipeline_params_tuning_config.yaml
└── config_yamls/
├── 0_test_acc_params_tuning_config.yaml
├── 1_test_acc_params_tuning_config.yaml
└── 2_test_acc_params_tuning_config.yaml
Where cta_svm is the directory we created for our new algorithm. The same pattern can apply for other methods, such as clustering_kmeans, regression_linreg, etc.
We’ll focus on the SVM example below.
2. Defining Our SVM Classifier
Suppose we want to define a custom SVM method for classification. We’ll inherit from BaseClassificationMethod and implement the required methods.
[1]:
from typing import Optional
from dance.modules.base import BaseClassificationMethod
from sklearn.svm import SVC
import numpy as np
from dance.transforms.cell_feature import WeightedFeaturePCA
from dance.transforms.misc import Compose, SetConfig
from dance.typing import LogLevel
class SVM(BaseClassificationMethod):
"""The SVM cell-type classification model.
Parameters
----------
args : argparse.Namespace
A Namespace contains arguments of SVM. See parser help document for more info.
prj_path: str
project path
"""
def __init__(self, args, prj_path="./", random_state: Optional[int] = None):
self.args = args
self.random_state = random_state
self._mdl = SVC(random_state=random_state, probability=True)
@staticmethod
def preprocessing_pipeline(n_components: int = 400, log_level: LogLevel = "INFO"):
return Compose(
WeightedFeaturePCA(n_components=n_components, split_name="train"),
SetConfig({
"feature_channel": "WeightedFeaturePCA",
"label_channel": "cell_type"
}),
log_level=log_level,
)
def fit(self, x: np.ndarray, y: np.ndarray):
"""Train the classifier.
Parameters
----------
x
Training cell features.
y
Training labels.
"""
self._mdl.fit(x, y)
def predict(self, x: np.ndarray):
"""Predict cell labels.
Parameters
----------
x
Samples to be predicted (samplex x features).
Returns
-------
y
Predicted labels of the input samples.
"""
return self._mdl.predict(x)
/home/zyxing/dance/dance/utils/matrix.py:178: NumbaExperimentalFeatureWarning: First-class function type feature is experimental
for j in numba.prange(n):
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.
warnings.warn(problem)
3. Example main.py File
Below is an example of how your main.py might look if you’re adding SVM as one of the classification methods. This file orchestrates the entire pipeline:
Register preprocessing functions through annotations (optional)
Parsing Arguments and configuring hyperparameters.
Defining an evaluation function that:
Loads and preprocesses the data.
Initializes your model (the new SVM class).
Trains and scores the model.
Logs results to Weights & Biases (wandb).
Running the hyperparameter sweep agent (e.g., via
wandb_sweep_agent).Saving results and optionally generating a second-stage tuning config file.
Note: For demonstration, only relevant code is shown. Adjust as needed for your exact pipeline or data.
[2]:
"""
Step 1: preprocessing functions can be registered using register_preprocessor.
In this example, the GaussRandProjFeature preprocessing function is registered within the feature.cell pipeline.
This registered function can later be specified in the configuration file.
"""
from sklearn.random_projection import GaussianRandomProjection
from dance.registry import register_preprocessor
from dance.transforms.base import BaseTransform
@register_preprocessor("feature", "cell",overwrite=True) # NOTE: register any custom preprocessing function to be used for tuning
class GaussRandProjFeature(BaseTransform):
"""Custom preprocessing to extract cell feature via Gaussian random projection."""
_DISPLAY_ATTRS = ("n_components", "eps")
def __init__(self, n_components: int = 400, eps: float = 0.1, **kwargs):
super().__init__(**kwargs)
self.n_components = n_components
self.eps = eps
def __call__(self, data):
feat = data.get_feature(return_type="numpy")
grp = GaussianRandomProjection(n_components=self.n_components, eps=self.eps)
self.logger.info(f"Start generateing cell feature via Gaussian random projection (d={self.n_components}).")
data.data.obsm[self.out] = grp.fit_transform(feat)
return data
[ ]:
# Example main.py
import argparse
import gc
import os
import pprint
import random
import sys
from pathlib import Path
from typing import get_args
from dance.registry import register_preprocessor
from dance.transforms.base import BaseTransform
import torch
import wandb
import numpy as np
from dance import logger
from dance.datasets.singlemodality import CellTypeAnnotationDataset # your dataset
from dance.pipeline import PipelinePlaner, get_step3_yaml, run_step3, save_summary_data
from dance.utils import set_seed
from dance.typing import LogLevel
from sklearn.random_projection import GaussianRandomProjection
root_path=str(Path(__file__).resolve().parent) if '__file__' in globals() else Path("tutorial.ipynb").resolve().parent
# Import your custom SVM class
# In reality, you'd do: from your_svm_file import SVM
# from your_svm_file import SVM
def main(args=None):
#Step 2: Parsing Arguments and configuring hyperparameters
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("--cache", action="store_true", help="Cache processed data.")
parser.add_argument("--dense_dim", type=int, default=400, help="dim of PCA")
parser.add_argument("--gpu", type=int, default=0, help="GPU id, set to -1 for CPU")
parser.add_argument("--log_level", type=str, default="INFO", choices=get_args(LogLevel))
parser.add_argument("--species", default="human")
parser.add_argument("--test_dataset", nargs="+", default=[138], type=int, help="list of dataset id")
parser.add_argument("--tissue", default="Brain") # TODO: Add option for different tissue name for train/test
parser.add_argument("--train_dataset", nargs="+", default=[328], type=int, help="list of dataset id")
parser.add_argument("--valid_dataset", nargs="+", default=None, type=int, help="list of dataset id")
parser.add_argument("--tune_mode", default="pipeline_params", choices=["pipeline", "params", "pipeline_params"])
parser.add_argument("--seed", type=int, default=10)
parser.add_argument("--count", type=int, default=2)
parser.add_argument("--sweep_id", type=str, default=None)
parser.add_argument("--summary_file_path", default="results/pipeline/best_test_acc.csv", type=str)
parser.add_argument("--root_path", default=root_path, type=str)
if args is None:
args = parser.parse_args()
else:
args = parser.parse_args(args)
# Construct the path to the tuning config file
file_root_path = Path(
args.root_path, "_".join([
"-".join([str(num) for num in dataset])
for dataset in [args.train_dataset, args.valid_dataset, args.test_dataset] if dataset is not None
])).resolve()
logger.info(f"\n files is saved in {file_root_path}")
# Instantiate pipeline planer from config file
pipeline_planer = PipelinePlaner.from_config_file(f"{file_root_path}/{args.tune_mode}_tuning_config.yaml")
os.environ["WANDB_AGENT_MAX_INITIAL_FAILURES"] = "2000"
#Step 3: define evaluation function
def evaluate_pipeline(tune_mode=args.tune_mode, pipeline_planer=pipeline_planer):
"""
The evaluation function used by wandb_sweep_agent.
It:
1. Loads data.
2. Applies the pipeline.
3. Trains and scores the model.
4: Evaluate model
5. Logs metric(s) to wandb.
"""
wandb.init(settings=wandb.Settings(start_method='thread'))
set_seed(args.seed)
# Load dataset
data = CellTypeAnnotationDataset(train_dataset=args.train_dataset, test_dataset=args.test_dataset,
valid_dataset=args.valid_dataset, species=args.species, tissue=args.tissue,
data_dir="../temp_data").load_data()
# Preprocessing pipeline
kwargs = {tune_mode: dict(wandb.config)}
preprocessing_pipeline = pipeline_planer.generate(**kwargs)
preprocessing_pipeline(data)
# Retrieve training / testing data
x_train, y_train = data.get_train_data()
y_train_converted = y_train.argmax(1)
x_valid, y_valid = data.get_val_data()
x_test, y_test = data.get_test_data()
#Initialize our custom SVM model and train
# from your_svm_file import SVM # Place your SVM import here
model = SVM(args, random_state=args.seed)
model.fit(x_train, y_train_converted)
#Evaluate model
train_score = model.score(x_train, y_train)
score = model.score(x_valid, y_valid)
test_score = model.score(x_test, y_test)
#Log results to wandb
wandb.log({"train_acc": train_score, "acc": score, "test_acc": test_score})
wandb.finish()
# Step 4: Run the sweep
entity, project, sweep_id = pipeline_planer.wandb_sweep_agent(
evaluate_pipeline, sweep_id=args.sweep_id, count=args.count)
#Step 5: Save summary data (top results, etc.)
save_summary_data(entity, project, sweep_id, summary_file_path=args.summary_file_path, root_path=file_root_path)
# Optionally, handle pipeline + parameter search steps
if args.tune_mode == "pipeline" or args.tune_mode == "pipeline_params":
get_step3_yaml(result_load_path=f"{args.summary_file_path}", step2_pipeline_planer=pipeline_planer,
conf_load_path=f"{Path(args.root_path).resolve().parent}/step3_default_params.yaml",
root_path=file_root_path)
if args.tune_mode == "pipeline_params":
run_step3(file_root_path, evaluate_pipeline, tune_mode="params", step2_pipeline_planer=pipeline_planer)
if __name__ == "__main__":
import os
# os.environ["http_proxy"] = "http://121.250.209.147:7890"
# os.environ["https_proxy"] = "http://121.250.209.147:7890"
main([])
[INFO][2025-08-20 12:25:15,393][dance][main]
files is saved in /home/zyxing/dance/examples/tuning/custom-methods/328_138
[INFO][2025-08-20 12:25:15,411][dance][config] tune mode is set to pipeline_params, tune_mode will first be converted to pipeline
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[INFO][2025-08-20 12:25:17,271][dance][wandb_sweep]
[*] Sweep ID: 15layk3y
[INFO][2025-08-20 12:25:17,272][dance][wandb_sweep_agent] Spawning agent: sweep_id='15layk3y', entity='xzy11632', project='dance-dev', count=2
Create sweep with ID: 15layk3y
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y
wandb: Agent Starting Run: dduxrl3d with config:
wandb: pipeline.0.filter.gene: FilterGenesPercentile
wandb: pipeline.1.normalize: ColumnSumNormalize
wandb: pipeline.2.filter.gene: FilterGenesRegression
wandb: pipeline.3.feature.cell: CellPCA
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: xzy11632. Use `wandb login --relogin` to force relogin
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122520-dduxrl3dSweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y
[INFO][2025-08-20 12:25:32,694][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:25:32,696][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:25:32,983][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:25:33,104][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:25:33,107][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:25:33,354][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:25:33,355][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:25:33,356][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:25:33,360][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:25:33,361][dance][wrapped_func] Took 0:00:00.665608 to load and process data.
[INFO][2025-08-20 12:25:33,361][dance][generate_config] The content in pipeline_params will be converted to pipeline
[INFO][2025-08-20 12:25:33,363][dance][_sanitize_pipeline] Pipeline plan:
['FilterGenesPercentile',
'ColumnSumNormalize',
'FilterGenesRegression',
'CellPCA',
None]
[WARNING][2025-08-20 12:25:33,370][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:25:33,376][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:25:33,618][dance][_filter_enclasc] Start generating cell features using EnClaSC
[WARNING][2025-08-20 12:25:33,641][dance.CellPCA][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='auto'
[INFO][2025-08-20 12:25:33,652][dance.CellPCA][__call__] Generating cell PCA features (466, 100) (k=100)
[INFO][2025-08-20 12:25:33,654][dance.CellPCA][__call__] Top 10 explained variances: [0.11390967 0.07235937 0.05432951 0.04682069 0.04452541 0.03371136
0.02961524 0.0270735 0.025507 0.02293849]
[INFO][2025-08-20 12:25:33,655][dance.CellPCA][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:25:33,657][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:25:33,658][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'
Run history:
| acc | ▁ |
| test_acc | ▁ |
| train_acc | ▁ |
Run summary:
| acc | 0.51515 |
| test_acc | 0.10145 |
| train_acc | 0.65649 |
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122520-dduxrl3d/logs
wandb: Agent Starting Run: nxe1pfd6 with config:
wandb: pipeline.0.filter.gene: FilterGenesPercentile
wandb: pipeline.1.normalize: ColumnSumNormalize
wandb: pipeline.2.filter.gene: FilterGenesRegression
wandb: pipeline.3.feature.cell: CellSVD
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122547-nxe1pfd6Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y
[INFO][2025-08-20 12:25:58,365][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:25:58,368][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:25:58,770][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:25:58,914][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:25:58,919][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:25:59,103][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:25:59,104][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:25:59,105][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:25:59,106][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:25:59,107][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:25:59,110][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:25:59,111][dance][wrapped_func] Took 0:00:00.743655 to load and process data.
[INFO][2025-08-20 12:25:59,111][dance][generate_config] The content in pipeline_params will be converted to pipeline
[INFO][2025-08-20 12:25:59,114][dance][_sanitize_pipeline] Pipeline plan:
['FilterGenesPercentile',
'ColumnSumNormalize',
'FilterGenesRegression',
'CellSVD',
None]
[WARNING][2025-08-20 12:25:59,121][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:25:59,127][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:25:59,320][dance][_filter_enclasc] Start generating cell features using EnClaSC
[WARNING][2025-08-20 12:25:59,340][dance.CellSVD][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='full'
[INFO][2025-08-20 12:25:59,388][dance.CellSVD][__call__] Generating cell SVD features (466, 100) (k=100)
[INFO][2025-08-20 12:25:59,389][dance.CellSVD][__call__] Top 10 explained variances: [0.0475235 0.10532387 0.06999493 0.05225454 0.04628773 0.03516576
0.03369351 0.02756703 0.02625577 0.02297754]
[INFO][2025-08-20 12:25:59,390][dance.CellSVD][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:25:59,391][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:25:59,392][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'
Run history:
| acc | ▁ |
| test_acc | ▁ |
| train_acc | ▁ |
Run summary:
| acc | 0.5 |
| test_acc | 0.0942 |
| train_acc | 0.64122 |
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122547-nxe1pfd6/logs
[INFO][2025-08-20 12:26:09,505][dance][wandb_sweep]
[*] Sweep ID: 1f9pschy
[INFO][2025-08-20 12:26:09,506][dance][wandb_sweep_agent] Spawning agent: sweep_id='1f9pschy', entity='xzy11632', project='dance-dev', count=2
Create sweep with ID: 1f9pschy
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy
wandb: Agent Starting Run: 69ew4oa2 with config:
wandb: params.0.FilterGenesPercentile.max_val: 98
wandb: params.0.FilterGenesPercentile.min_val: 8
wandb: params.0.FilterGenesPercentile.mode: rv
wandb: params.1.ColumnSumNormalize.eps: 0.7
wandb: params.1.ColumnSumNormalize.mode: minmax
wandb: params.2.FilterGenesRegression.method: scmap
wandb: params.2.FilterGenesRegression.num_genes: 5388
wandb: params.3.CellPCA.n_components: 227
wandb: params.3.CellPCA.svd_solver: arpack
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122613-69ew4oa2Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy
[INFO][2025-08-20 12:26:24,459][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:26:24,461][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:26:24,854][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:26:24,996][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:26:25,000][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:26:25,171][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:26:25,179][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:26:25,180][dance][wrapped_func] Took 0:00:00.720475 to load and process data.
[INFO][2025-08-20 12:26:25,182][dance][_sanitize_params] Params plan:
[{'max_val': 98, 'min_val': 8, 'mode': 'rv'},
{'eps': 0.7, 'mode': 'minmax'},
{'method': 'scmap', 'num_genes': 5388},
{'n_components': 227, 'svd_solver': 'arpack'},
None]
[WARNING][2025-08-20 12:26:25,193][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:26:25,200][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:490: RuntimeWarning: invalid value encountered in divide
gene_summary = np.nan_to_num(np.array(x.var(0) / x.mean(0)), posinf=0, neginf=0).ravel()
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:26:25,452][dance][_filter_scmap] Start generating cell features using scmap
[INFO][2025-08-20 12:26:26,328][dance.CellPCA][__call__] Generating cell PCA features (466, 5388) (k=227)
[INFO][2025-08-20 12:26:26,330][dance.CellPCA][__call__] Top 10 explained variances: [0.05885801 0.0215941 0.01331656 0.01141509 0.00957707 0.00813536
0.00765066 0.00701284 0.00689292 0.00644914]
[INFO][2025-08-20 12:26:26,331][dance.CellPCA][__call__] Total explained variance: 76.84%
[INFO][2025-08-20 12:26:26,332][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:26:26,333][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'
Run history:
| acc | ▁ |
| test_acc | ▁ |
| train_acc | ▁ |
Run summary:
| acc | 0.71212 |
| test_acc | 0.36232 |
| train_acc | 0.89695 |
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122613-69ew4oa2/logs
wandb: Agent Starting Run: h8j0oo74 with config:
wandb: params.0.FilterGenesPercentile.max_val: 96
wandb: params.0.FilterGenesPercentile.min_val: 1
wandb: params.0.FilterGenesPercentile.mode: cv
wandb: params.1.ColumnSumNormalize.eps: 0.3
wandb: params.1.ColumnSumNormalize.mode: standardize
wandb: params.2.FilterGenesRegression.method: seurat3
wandb: params.2.FilterGenesRegression.num_genes: 9419
wandb: params.3.CellPCA.n_components: 636
wandb: params.3.CellPCA.svd_solver: arpack
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122640-h8j0oo74Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy
[INFO][2025-08-20 12:26:51,507][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:26:51,512][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:26:51,877][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:26:52,016][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:26:52,020][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:26:52,175][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:26:52,178][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:26:52,179][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:26:52,181][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:26:52,182][dance][wrapped_func] Took 0:00:00.670366 to load and process data.
[INFO][2025-08-20 12:26:52,183][dance][_sanitize_params] Params plan:
[{'max_val': 96, 'min_val': 1, 'mode': 'cv'},
{'eps': 0.3, 'mode': 'standardize'},
{'method': 'seurat3', 'num_genes': 9419},
{'n_components': 636, 'svd_solver': 'arpack'},
None]
[WARNING][2025-08-20 12:26:52,190][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:26:52,195][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:488: RuntimeWarning: invalid value encountered in divide
gene_summary = np.nan_to_num(np.array(x.std(0) / x.mean(0)), posinf=0, neginf=0).ravel()
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:26:52,411][dance][_filter_seurat3] Start generating cell features using Seurat v3.0
[WARNING][2025-08-20 12:26:52,460][dance.CellPCA][__call__] n_components=636 must be between 0 and min(n_samples, n_features)=466 with svd_solver='arpack'
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122640-h8j0oo74/logs
Run h8j0oo74 errored:
Traceback (most recent call last):
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
self._function()
File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
preprocessing_pipeline(data)
File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
return self.functional(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
a(*args, **kwargs)
File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
return self.functional(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
return original_call(self, data, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 177, in __call__
cell_feat = pca.fit_transform(feat)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 460, in fit_transform
U, S, Vt = self._fit(X)
^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 512, in _fit
return self._fit_truncated(X, n_components, self._fit_svd_solver)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 592, in _fit_truncated
raise ValueError(
ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'
wandb: ERROR Run h8j0oo74 errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
wandb: ERROR self._function()
wandb: ERROR File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
wandb: ERROR preprocessing_pipeline(data)
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR return self.functional(*args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
wandb: ERROR a(*args, **kwargs)
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR return self.functional(*args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
wandb: ERROR return original_call(self, data, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 177, in __call__
wandb: ERROR cell_feat = pca.fit_transform(feat)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
wandb: ERROR data_to_wrap = f(self, X, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
wandb: ERROR return fit_method(estimator, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 460, in fit_transform
wandb: ERROR U, S, Vt = self._fit(X)
wandb: ERROR ^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 512, in _fit
wandb: ERROR return self._fit_truncated(X, n_components, self._fit_svd_solver)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 592, in _fit_truncated
wandb: ERROR raise ValueError(
wandb: ERROR ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'
wandb: ERROR
[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep]
[*] Sweep ID: cyuki0fw
[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep_agent] Spawning agent: sweep_id='cyuki0fw', entity='xzy11632', project='dance-dev', count=2
Create sweep with ID: cyuki0fw
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw
wandb: Agent Starting Run: jhryorbj with config:
wandb: params.0.FilterGenesPercentile.max_val: 98
wandb: params.0.FilterGenesPercentile.min_val: 4
wandb: params.0.FilterGenesPercentile.mode: sum
wandb: params.1.ColumnSumNormalize.eps: 0.1
wandb: params.1.ColumnSumNormalize.mode: minmax
wandb: params.2.FilterGenesRegression.method: scmap
wandb: params.2.FilterGenesRegression.num_genes: 7435
wandb: params.3.CellSVD.algorithm: arpack
wandb: params.3.CellSVD.n_components: 793
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122705-jhryorbjSweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw
[INFO][2025-08-20 12:27:16,705][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:27:16,707][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:27:17,099][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:27:17,243][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:27:17,247][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:27:17,424][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:27:17,425][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:27:17,426][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:27:17,427][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:27:17,428][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:27:17,430][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:27:17,430][dance][wrapped_func] Took 0:00:00.723811 to load and process data.
[INFO][2025-08-20 12:27:17,432][dance][_sanitize_params] Params plan:
[{'max_val': 98, 'min_val': 4, 'mode': 'sum'},
{'eps': 0.1, 'mode': 'minmax'},
{'method': 'scmap', 'num_genes': 7435},
{'algorithm': 'arpack', 'n_components': 793},
None]
[WARNING][2025-08-20 12:27:17,442][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:27:17,448][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:27:17,628][dance][_filter_scmap] Start generating cell features using scmap
[WARNING][2025-08-20 12:27:17,662][dance.CellSVD][__call__] n_components=793 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122705-jhryorbj/logs
Run jhryorbj errored:
Traceback (most recent call last):
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
self._function()
File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
preprocessing_pipeline(data)
File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
return self.functional(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
a(*args, **kwargs)
File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
return self.functional(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
return original_call(self, data, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 275, in __call__
cell_feat = svd.fit_transform(feat)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py", line 234, in fit_transform
U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 438, in svds
args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 44, in _iv
raise ValueError(message)
ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.
wandb: ERROR Run jhryorbj errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
wandb: ERROR self._function()
wandb: ERROR File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
wandb: ERROR preprocessing_pipeline(data)
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR return self.functional(*args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
wandb: ERROR a(*args, **kwargs)
wandb: ERROR File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR return self.functional(*args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
wandb: ERROR return original_call(self, data, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 275, in __call__
wandb: ERROR cell_feat = svd.fit_transform(feat)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
wandb: ERROR data_to_wrap = f(self, X, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
wandb: ERROR return fit_method(estimator, *args, **kwargs)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py", line 234, in fit_transform
wandb: ERROR U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 438, in svds
wandb: ERROR args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,
wandb: ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 44, in _iv
wandb: ERROR raise ValueError(message)
wandb: ERROR ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.
wandb: ERROR
wandb: Agent Starting Run: 9bwt95uk with config:
wandb: params.0.FilterGenesPercentile.max_val: 95
wandb: params.0.FilterGenesPercentile.min_val: 6
wandb: params.0.FilterGenesPercentile.mode: var
wandb: params.1.ColumnSumNormalize.eps: 0.5
wandb: params.1.ColumnSumNormalize.mode: normalize
wandb: params.2.FilterGenesRegression.method: scmap
wandb: params.2.FilterGenesRegression.num_genes: 1609
wandb: params.3.CellSVD.algorithm: randomized
wandb: params.3.CellSVD.n_components: 539
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122733-9bwt95ukSweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw
[INFO][2025-08-20 12:27:44,260][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:27:44,262][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:27:44,657][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:27:44,802][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:27:44,806][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
[INFO][2025-08-20 12:27:44,982][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:27:44,983][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:27:44,985][dance][_load_raw_data] Cell-types (n=9):
['OPC',
'astrocytes',
'endothelial',
'fetal_quiescent',
'fetal_replicating',
'hybrid',
'microglia',
'neurons',
'oligodendrocytes']
[INFO][2025-08-20 12:27:44,987][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
uns: 'dance_config'
obsm: 'cell_type'
[INFO][2025-08-20 12:27:44,988][dance][wrapped_func] Took 0:00:00.726977 to load and process data.
[INFO][2025-08-20 12:27:44,990][dance][_sanitize_params] Params plan:
[{'max_val': 95, 'min_val': 6, 'mode': 'var'},
{'eps': 0.5, 'mode': 'normalize'},
{'method': 'scmap', 'num_genes': 1609},
{'algorithm': 'randomized', 'n_components': 539},
None]
[WARNING][2025-08-20 12:27:45,000][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:27:45,005][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:27:45,232][dance][_filter_scmap] Start generating cell features using scmap
[WARNING][2025-08-20 12:27:45,256][dance.CellSVD][__call__] n_components=539 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'
[INFO][2025-08-20 12:27:46,636][dance.CellSVD][__call__] Generating cell SVD features (466, 1609) (k=466)
[INFO][2025-08-20 12:27:46,639][dance.CellSVD][__call__] Top 10 explained variances: [0.01398817 0.01243401 0.00987709 0.00972114 0.00953149 0.00923472
0.00884004 0.00859845 0.00861041 0.00851821]
[INFO][2025-08-20 12:27:46,640][dance.CellSVD][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:27:46,641][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:27:46,642][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'
Run history:
| acc | ▁ |
| test_acc | ▁ |
| train_acc | ▁ |
Run summary:
| acc | 0.36364 |
| test_acc | 0.06522 |
| train_acc | 0.67176 |
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
./wandb/run-20250820_122733-9bwt95uk/logs3. Auto-Search Configuration
The configuration files (e.g., pipeline_params_tuning_config.yaml, pipeline_tuning_config.yaml, params_tuning_config.yaml) guide the auto-search. Each file contains instructions for how to vary your preprocessing pipeline or model hyperparameters (or both). For example:
#pipeline_params_tuning_config.yaml
type: preprocessor
tune_mode: pipeline_params
pipeline_tuning_top_k: 2
parameter_tuning_freq_n: 2
pipeline:
- type: filter.gene
include:
- FilterGenesPercentile
- FilterGenesScanpyOrder
- FilterGenesPlaceHolder
default_params:
FilterGenesScanpyOrder:
order: ["min_counts", "min_cells", "max_counts", "max_cells"]
min_counts: 1
max_counts: 134732
min_cells: 1
max_cells: 401
- type: normalize
include:
- ScaleFeature
- ScTransform
- Log1P
- NormalizeTotal
- NormalizePlaceHolder
default_params:
ScTransform:
processes_num: 8
- type: filter.gene
include:
# - HighlyVariableGenesLogarithmizedByMeanAndDisp
- HighlyVariableGenesRawCount
- HighlyVariableGenesLogarithmizedByTopGenes
- FilterGenesTopK
- FilterGenesRegression
# - FilterGenesNumberPlaceHolder
default_params:
FilterGenesTopK:
num_genes: 100
FilterGenesRegression:
num_genes: 100
HighlyVariableGenesRawCount:
n_top_genes: 100
HighlyVariableGenesLogarithmizedByTopGenes:
n_top_genes: 100
- type: feature.cell
include:
- WeightedFeaturePCA
- WeightedFeatureSVD
- CellPCA
- CellSVD
- GaussRandProjFeature # Registered custom preprocessing func
- FeatureCellPlaceHolder
params:
out: feature.cell
log_level: INFO
- type: misc
target: SetConfig
params:
config_dict:
feature_channel: feature.cell
label_channel: cell_type
wandb:
entity: xzy11632
project: dance-dev
method: grid #try grid to provide a comprehensive search
metric:
name: acc # val/acc
goal: maximize
Tips:
In
tune_mode=pipeline, the system will only tune the preprocessing pipeline.In
tune_mode=params, the system will only tune the model parameters.In
tune_mode=pipeline_params, the system will do a two-stage search: first for pipelines, then for model parameters.
4. Testing & Execution
After setting everything up:
# Search only the best preprocessing pipeline:
python main.py --tune_mode pipeline
# Search only the best model hyperparameters:
python main.py --tune_mode params
# Joint two-stage search for both pipeline and parameters:
python main.py --tune_mode pipeline_params
Once this completes, you should see results logged into Weights & Biases (wandb). The save_summary_data function writes out a CSV of the top performing runs. If you selected pipeline_params, the script also generates a default param config for the second stage of the search, which is automatically run via run_step3.
5. Summary
By following these steps:
Inherit from the appropriate base class (in our case BaseClassificationMethod). Implement the fit, predict, and (optionally) preprocessing_pipeline methods. Integrate your custom model into the main.py script. Create and reference the necessary configuration (YAML) files. Run the pipeline using –tune_mode (pipeline|params|pipeline_params). …you can easily plug in any custom algorithm—ranging from simple classification methods like an SVM to deep learning methods with pretraining steps—into this auto-search framework.
Happy coding and good luck with your hyperparameter searches!