Tutorial: Plugging User-Designed Methods into DANCE 2.0 for Auto-Search

In this notebook, we’ll walk through how to integrate a new algorithm (specifically an SVM classifier) into the auto-search framework outlined in your documentation. We will:

Inherit from the BaseClassificationMethod (or another suitable base) to define our custom method. Implement the required interfaces (fit, predict, and optionally preprocessing_pipeline).
Show how to run the hyperparameter search using the integrated method.
Provide an example main.py-like script that demonstrates how the auto-search process is orchestrated.

1. Folder Structure & Requirements

Before diving in, ensure you have the following directory structure (at least conceptually; your actual project structure can be more extensive):

examples/tuning/
└── classification_svm/
    ├── main.py
    ├── tutorial.ipynb
    └── dataset_name/
        ├── pipeline_params_tuning_config.yaml
        └── config_yamls/
            ├── 0_test_acc_params_tuning_config.yaml
            ├── 1_test_acc_params_tuning_config.yaml
            └── 2_test_acc_params_tuning_config.yaml

Where cta_svm is the directory we created for our new algorithm. The same pattern can apply for other methods, such as clustering_kmeans, regression_linreg, etc.

We’ll focus on the SVM example below.

2. Defining Our SVM Classifier

Suppose we want to define a custom SVM method for classification. We’ll inherit from BaseClassificationMethod and implement the required methods.

[1]:

from typing import Optional
from dance.modules.base import BaseClassificationMethod
from sklearn.svm import SVC
import numpy as np

from dance.transforms.cell_feature import WeightedFeaturePCA
from dance.transforms.misc import Compose, SetConfig
from dance.typing import LogLevel

class SVM(BaseClassificationMethod):
    """The SVM cell-type classification model.

    Parameters
    ----------
    args : argparse.Namespace
        A Namespace contains arguments of SVM. See parser help document for more info.
    prj_path: str
        project path

    """

    def __init__(self, args, prj_path="./", random_state: Optional[int] = None):
        self.args = args
        self.random_state = random_state
        self._mdl = SVC(random_state=random_state, probability=True)

    @staticmethod
    def preprocessing_pipeline(n_components: int = 400, log_level: LogLevel = "INFO"):
        return Compose(
            WeightedFeaturePCA(n_components=n_components, split_name="train"),
            SetConfig({
                "feature_channel": "WeightedFeaturePCA",
                "label_channel": "cell_type"
            }),
            log_level=log_level,
        )

    def fit(self, x: np.ndarray, y: np.ndarray):
        """Train the classifier.

        Parameters
        ----------
        x
            Training cell features.
        y
            Training labels.

        """
        self._mdl.fit(x, y)

    def predict(self, x: np.ndarray):
        """Predict cell labels.

        Parameters
        ----------
        x
            Samples to be predicted (samplex x features).

        Returns
        -------
        y
            Predicted labels of the input samples.

        """
        return self._mdl.predict(x)

/home/zyxing/dance/dance/utils/matrix.py:178: NumbaExperimentalFeatureWarning: First-class function type feature is experimental
  for j in numba.prange(n):
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.
  warnings.warn(problem)

3. Example `main.py` File

Below is an example of how your main.py might look if you’re adding SVM as one of the classification methods. This file orchestrates the entire pipeline:

Register preprocessing functions through annotations (optional)
Parsing Arguments and configuring hyperparameters.
Defining an evaluation function that:
- Loads and preprocesses the data.
- Initializes your model (the new SVM class).
- Trains and scores the model.
- Logs results to Weights & Biases (wandb).
Running the hyperparameter sweep agent (e.g., via wandb_sweep_agent).
Saving results and optionally generating a second-stage tuning config file.

Note: For demonstration, only relevant code is shown. Adjust as needed for your exact pipeline or data.

[2]:

"""
Step 1: preprocessing functions can be registered using register_preprocessor.
In this example, the GaussRandProjFeature preprocessing function is registered within the feature.cell pipeline.
This registered function can later be specified in the configuration file.
"""
from sklearn.random_projection import GaussianRandomProjection
from dance.registry import register_preprocessor
from dance.transforms.base import BaseTransform


@register_preprocessor("feature", "cell",overwrite=True)  # NOTE: register any custom preprocessing function to be used for tuning
class GaussRandProjFeature(BaseTransform):
    """Custom preprocessing to extract cell feature via Gaussian random projection."""

    _DISPLAY_ATTRS = ("n_components", "eps")

    def __init__(self, n_components: int = 400, eps: float = 0.1, **kwargs):
        super().__init__(**kwargs)
        self.n_components = n_components
        self.eps = eps

    def __call__(self, data):
        feat = data.get_feature(return_type="numpy")
        grp = GaussianRandomProjection(n_components=self.n_components, eps=self.eps)

        self.logger.info(f"Start generateing cell feature via Gaussian random projection (d={self.n_components}).")
        data.data.obsm[self.out] = grp.fit_transform(feat)

        return data

[ ]:

# Example main.py

import argparse
import gc
import os
import pprint
import random
import sys
from pathlib import Path
from typing import get_args

from dance.registry import register_preprocessor
from dance.transforms.base import BaseTransform
import torch
import wandb
import numpy as np

from dance import logger
from dance.datasets.singlemodality import CellTypeAnnotationDataset  # your dataset
from dance.pipeline import PipelinePlaner, get_step3_yaml, run_step3, save_summary_data
from dance.utils import set_seed
from dance.typing import LogLevel
from sklearn.random_projection import GaussianRandomProjection
root_path=str(Path(__file__).resolve().parent) if '__file__' in globals() else Path("tutorial.ipynb").resolve().parent

# Import your custom SVM class
# In reality, you'd do: from your_svm_file import SVM
# from your_svm_file import SVM


def main(args=None):
    #Step 2: Parsing Arguments and configuring hyperparameters
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("--cache", action="store_true", help="Cache processed data.")
    parser.add_argument("--dense_dim", type=int, default=400, help="dim of PCA")
    parser.add_argument("--gpu", type=int, default=0, help="GPU id, set to -1 for CPU")
    parser.add_argument("--log_level", type=str, default="INFO", choices=get_args(LogLevel))
    parser.add_argument("--species", default="human")
    parser.add_argument("--test_dataset", nargs="+", default=[138], type=int, help="list of dataset id")
    parser.add_argument("--tissue", default="Brain")  # TODO: Add option for different tissue name for train/test
    parser.add_argument("--train_dataset", nargs="+", default=[328], type=int, help="list of dataset id")
    parser.add_argument("--valid_dataset", nargs="+", default=None, type=int, help="list of dataset id")
    parser.add_argument("--tune_mode", default="pipeline_params", choices=["pipeline", "params", "pipeline_params"])
    parser.add_argument("--seed", type=int, default=10)
    parser.add_argument("--count", type=int, default=2)
    parser.add_argument("--sweep_id", type=str, default=None)
    parser.add_argument("--summary_file_path", default="results/pipeline/best_test_acc.csv", type=str)
    parser.add_argument("--root_path", default=root_path, type=str)
    if args is None:
        args = parser.parse_args()
    else:
        args = parser.parse_args(args)

    # Construct the path to the tuning config file
    file_root_path = Path(
        args.root_path, "_".join([
            "-".join([str(num) for num in dataset])
            for dataset in [args.train_dataset, args.valid_dataset, args.test_dataset] if dataset is not None
        ])).resolve()
    logger.info(f"\n files is saved in {file_root_path}")

    # Instantiate pipeline planer from config file
    pipeline_planer = PipelinePlaner.from_config_file(f"{file_root_path}/{args.tune_mode}_tuning_config.yaml")
    os.environ["WANDB_AGENT_MAX_INITIAL_FAILURES"] = "2000"

    #Step 3: define evaluation function
    def evaluate_pipeline(tune_mode=args.tune_mode, pipeline_planer=pipeline_planer):
        """
        The evaluation function used by wandb_sweep_agent.
        It:
        1. Loads data.
        2. Applies the pipeline.
        3. Trains and scores the model.
        4: Evaluate model
        5. Logs metric(s) to wandb.
        """
        wandb.init(settings=wandb.Settings(start_method='thread'))
        set_seed(args.seed)

        # Load dataset
        data = CellTypeAnnotationDataset(train_dataset=args.train_dataset, test_dataset=args.test_dataset,
                                         valid_dataset=args.valid_dataset, species=args.species, tissue=args.tissue,
                                         data_dir="../temp_data").load_data()

        # Preprocessing pipeline
        kwargs = {tune_mode: dict(wandb.config)}
        preprocessing_pipeline = pipeline_planer.generate(**kwargs)
        preprocessing_pipeline(data)

        # Retrieve training / testing data
        x_train, y_train = data.get_train_data()
        y_train_converted = y_train.argmax(1)
        x_valid, y_valid = data.get_val_data()
        x_test, y_test = data.get_test_data()

        #Initialize our custom SVM model and train
        # from your_svm_file import SVM  # Place your SVM import here
        model = SVM(args, random_state=args.seed)
        model.fit(x_train, y_train_converted)

        #Evaluate model
        train_score = model.score(x_train, y_train)
        score = model.score(x_valid, y_valid)
        test_score = model.score(x_test, y_test)

        #Log results to wandb
        wandb.log({"train_acc": train_score, "acc": score, "test_acc": test_score})
        wandb.finish()

    # Step 4: Run the sweep
    entity, project, sweep_id = pipeline_planer.wandb_sweep_agent(
        evaluate_pipeline, sweep_id=args.sweep_id, count=args.count)

    #Step 5: Save summary data (top results, etc.)
    save_summary_data(entity, project, sweep_id, summary_file_path=args.summary_file_path, root_path=file_root_path)

    # Optionally, handle pipeline + parameter search steps
    if args.tune_mode == "pipeline" or args.tune_mode == "pipeline_params":
        get_step3_yaml(result_load_path=f"{args.summary_file_path}", step2_pipeline_planer=pipeline_planer,
                       conf_load_path=f"{Path(args.root_path).resolve().parent}/step3_default_params.yaml",
                       root_path=file_root_path)
        if args.tune_mode == "pipeline_params":
            run_step3(file_root_path, evaluate_pipeline, tune_mode="params", step2_pipeline_planer=pipeline_planer)
if __name__ == "__main__":
    import os
    # os.environ["http_proxy"] = "http://121.250.209.147:7890"
    # os.environ["https_proxy"] = "http://121.250.209.147:7890"
    main([])

[INFO][2025-08-20 12:25:15,393][dance][main]
 files is saved in /home/zyxing/dance/examples/tuning/custom-methods/328_138
[INFO][2025-08-20 12:25:15,411][dance][config] tune mode is set to pipeline_params, tune_mode will first be converted to pipeline
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[INFO][2025-08-20 12:25:17,271][dance][wandb_sweep] 

        [*] Sweep ID: 15layk3y

[INFO][2025-08-20 12:25:17,272][dance][wandb_sweep_agent] Spawning agent: sweep_id='15layk3y', entity='xzy11632', project='dance-dev', count=2

Create sweep with ID: 15layk3y
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y

wandb: Agent Starting Run: dduxrl3d with config:
wandb:      pipeline.0.filter.gene: FilterGenesPercentile
wandb:      pipeline.1.normalize: ColumnSumNormalize
wandb:      pipeline.2.filter.gene: FilterGenesRegression
wandb:      pipeline.3.feature.cell: CellPCA
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: xzy11632. Use `wandb login --relogin` to force relogin

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122520-dduxrl3d

Syncing run rural-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y

View run at https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d

[INFO][2025-08-20 12:25:32,694][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:25:32,696][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:25:32,983][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:25:33,104][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:25:33,107][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:25:33,354][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:25:33,355][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:25:33,356][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:25:33,360][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:25:33,361][dance][wrapped_func] Took 0:00:00.665608 to load and process data.
[INFO][2025-08-20 12:25:33,361][dance][generate_config] The content in pipeline_params will be converted to pipeline
[INFO][2025-08-20 12:25:33,363][dance][_sanitize_pipeline] Pipeline plan:
['FilterGenesPercentile',
 'ColumnSumNormalize',
 'FilterGenesRegression',
 'CellPCA',
 None]
[WARNING][2025-08-20 12:25:33,370][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:25:33,376][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:25:33,618][dance][_filter_enclasc] Start generating cell features using EnClaSC
[WARNING][2025-08-20 12:25:33,641][dance.CellPCA][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='auto'
[INFO][2025-08-20 12:25:33,652][dance.CellPCA][__call__] Generating cell PCA features (466, 100) (k=100)
[INFO][2025-08-20 12:25:33,654][dance.CellPCA][__call__] Top 10 explained variances: [0.11390967 0.07235937 0.05432951 0.04682069 0.04452541 0.03371136
 0.02961524 0.0270735  0.025507   0.02293849]
[INFO][2025-08-20 12:25:33,655][dance.CellPCA][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:25:33,657][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:25:33,658][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'

Run history:

acc	▁
test_acc	▁
train_acc	▁

Run summary:

acc	0.51515
test_acc	0.10145
train_acc	0.65649

View run rural-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122520-dduxrl3d/logs

wandb: Agent Starting Run: nxe1pfd6 with config:
wandb:      pipeline.0.filter.gene: FilterGenesPercentile
wandb:      pipeline.1.normalize: ColumnSumNormalize
wandb:      pipeline.2.filter.gene: FilterGenesRegression
wandb:      pipeline.3.feature.cell: CellSVD
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122547-nxe1pfd6

Syncing run olive-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y

View run at https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6

[INFO][2025-08-20 12:25:58,365][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:25:58,368][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:25:58,770][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:25:58,914][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:25:58,919][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:25:59,103][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:25:59,104][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:25:59,105][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:25:59,106][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:25:59,107][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:25:59,110][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:25:59,111][dance][wrapped_func] Took 0:00:00.743655 to load and process data.
[INFO][2025-08-20 12:25:59,111][dance][generate_config] The content in pipeline_params will be converted to pipeline
[INFO][2025-08-20 12:25:59,114][dance][_sanitize_pipeline] Pipeline plan:
['FilterGenesPercentile',
 'ColumnSumNormalize',
 'FilterGenesRegression',
 'CellSVD',
 None]
[WARNING][2025-08-20 12:25:59,121][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:25:59,127][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:25:59,320][dance][_filter_enclasc] Start generating cell features using EnClaSC
[WARNING][2025-08-20 12:25:59,340][dance.CellSVD][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='full'
[INFO][2025-08-20 12:25:59,388][dance.CellSVD][__call__] Generating cell SVD features (466, 100) (k=100)
[INFO][2025-08-20 12:25:59,389][dance.CellSVD][__call__] Top 10 explained variances: [0.0475235  0.10532387 0.06999493 0.05225454 0.04628773 0.03516576
 0.03369351 0.02756703 0.02625577 0.02297754]
[INFO][2025-08-20 12:25:59,390][dance.CellSVD][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:25:59,391][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:25:59,392][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'

Run history:

acc	▁
test_acc	▁
train_acc	▁

Run summary:

acc	0.5
test_acc	0.0942
train_acc	0.64122

View run olive-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122547-nxe1pfd6/logs

[INFO][2025-08-20 12:26:09,505][dance][wandb_sweep] 

        [*] Sweep ID: 1f9pschy

[INFO][2025-08-20 12:26:09,506][dance][wandb_sweep_agent] Spawning agent: sweep_id='1f9pschy', entity='xzy11632', project='dance-dev', count=2

Create sweep with ID: 1f9pschy
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy

wandb: Agent Starting Run: 69ew4oa2 with config:
wandb:      params.0.FilterGenesPercentile.max_val: 98
wandb:      params.0.FilterGenesPercentile.min_val: 8
wandb:      params.0.FilterGenesPercentile.mode: rv
wandb:      params.1.ColumnSumNormalize.eps: 0.7
wandb:      params.1.ColumnSumNormalize.mode: minmax
wandb:      params.2.FilterGenesRegression.method: scmap
wandb:      params.2.FilterGenesRegression.num_genes: 5388
wandb:      params.3.CellPCA.n_components: 227
wandb:      params.3.CellPCA.svd_solver: arpack
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122613-69ew4oa2

Syncing run morning-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy

View run at https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2

[INFO][2025-08-20 12:26:24,459][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:26:24,461][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:26:24,854][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:26:24,996][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:26:25,000][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:26:25,171][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:26:25,179][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:26:25,180][dance][wrapped_func] Took 0:00:00.720475 to load and process data.
[INFO][2025-08-20 12:26:25,182][dance][_sanitize_params] Params plan:
[{'max_val': 98, 'min_val': 8, 'mode': 'rv'},
 {'eps': 0.7, 'mode': 'minmax'},
 {'method': 'scmap', 'num_genes': 5388},
 {'n_components': 227, 'svd_solver': 'arpack'},
 None]
[WARNING][2025-08-20 12:26:25,193][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:26:25,200][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:490: RuntimeWarning: invalid value encountered in divide
  gene_summary = np.nan_to_num(np.array(x.var(0) / x.mean(0)), posinf=0, neginf=0).ravel()
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:26:25,452][dance][_filter_scmap] Start generating cell features using scmap
[INFO][2025-08-20 12:26:26,328][dance.CellPCA][__call__] Generating cell PCA features (466, 5388) (k=227)
[INFO][2025-08-20 12:26:26,330][dance.CellPCA][__call__] Top 10 explained variances: [0.05885801 0.0215941  0.01331656 0.01141509 0.00957707 0.00813536
 0.00765066 0.00701284 0.00689292 0.00644914]
[INFO][2025-08-20 12:26:26,331][dance.CellPCA][__call__] Total explained variance: 76.84%
[INFO][2025-08-20 12:26:26,332][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:26:26,333][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'

Run history:

acc	▁
test_acc	▁
train_acc	▁

Run summary:

acc	0.71212
test_acc	0.36232
train_acc	0.89695

View run morning-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122613-69ew4oa2/logs

wandb: Agent Starting Run: h8j0oo74 with config:
wandb:      params.0.FilterGenesPercentile.max_val: 96
wandb:      params.0.FilterGenesPercentile.min_val: 1
wandb:      params.0.FilterGenesPercentile.mode: cv
wandb:      params.1.ColumnSumNormalize.eps: 0.3
wandb:      params.1.ColumnSumNormalize.mode: standardize
wandb:      params.2.FilterGenesRegression.method: seurat3
wandb:      params.2.FilterGenesRegression.num_genes: 9419
wandb:      params.3.CellPCA.n_components: 636
wandb:      params.3.CellPCA.svd_solver: arpack
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122640-h8j0oo74

Syncing run earnest-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy

View run at https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74

[INFO][2025-08-20 12:26:51,507][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:26:51,512][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:26:51,877][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:26:52,016][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:26:52,020][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:26:52,175][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:26:52,178][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:26:52,179][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:26:52,181][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:26:52,182][dance][wrapped_func] Took 0:00:00.670366 to load and process data.
[INFO][2025-08-20 12:26:52,183][dance][_sanitize_params] Params plan:
[{'max_val': 96, 'min_val': 1, 'mode': 'cv'},
 {'eps': 0.3, 'mode': 'standardize'},
 {'method': 'seurat3', 'num_genes': 9419},
 {'n_components': 636, 'svd_solver': 'arpack'},
 None]
[WARNING][2025-08-20 12:26:52,190][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:26:52,195][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:488: RuntimeWarning: invalid value encountered in divide
  gene_summary = np.nan_to_num(np.array(x.std(0) / x.mean(0)), posinf=0, neginf=0).ravel()
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:26:52,411][dance][_filter_seurat3] Start generating cell features using Seurat v3.0
[WARNING][2025-08-20 12:26:52,460][dance.CellPCA][__call__] n_components=636 must be between 0 and min(n_samples, n_features)=466 with svd_solver='arpack'

View run earnest-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122640-h8j0oo74/logs

Run h8j0oo74 errored:
Traceback (most recent call last):
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
    self._function()
  File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
    preprocessing_pipeline(data)
  File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
    return self.functional(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
    a(*args, **kwargs)
  File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
    return self.functional(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
    return original_call(self, data, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 177, in __call__
    cell_feat = pca.fit_transform(feat)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 460, in fit_transform
    U, S, Vt = self._fit(X)
               ^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 512, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 592, in _fit_truncated
    raise ValueError(
ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'

wandb: ERROR Run h8j0oo74 errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
wandb: ERROR     self._function()
wandb: ERROR   File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
wandb: ERROR     preprocessing_pipeline(data)
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR     return self.functional(*args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
wandb: ERROR     a(*args, **kwargs)
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR     return self.functional(*args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
wandb: ERROR     return original_call(self, data, *args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 177, in __call__
wandb: ERROR     cell_feat = pca.fit_transform(feat)
wandb: ERROR                 ^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
wandb: ERROR     data_to_wrap = f(self, X, *args, **kwargs)
wandb: ERROR                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
wandb: ERROR     return fit_method(estimator, *args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 460, in fit_transform
wandb: ERROR     U, S, Vt = self._fit(X)
wandb: ERROR                ^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 512, in _fit
wandb: ERROR     return self._fit_truncated(X, n_components, self._fit_svd_solver)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py", line 592, in _fit_truncated
wandb: ERROR     raise ValueError(
wandb: ERROR ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'
wandb: ERROR
[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep] 

        [*] Sweep ID: cyuki0fw

[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep_agent] Spawning agent: sweep_id='cyuki0fw', entity='xzy11632', project='dance-dev', count=2

Create sweep with ID: cyuki0fw
Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw

wandb: Agent Starting Run: jhryorbj with config:
wandb:      params.0.FilterGenesPercentile.max_val: 98
wandb:      params.0.FilterGenesPercentile.min_val: 4
wandb:      params.0.FilterGenesPercentile.mode: sum
wandb:      params.1.ColumnSumNormalize.eps: 0.1
wandb:      params.1.ColumnSumNormalize.mode: minmax
wandb:      params.2.FilterGenesRegression.method: scmap
wandb:      params.2.FilterGenesRegression.num_genes: 7435
wandb:      params.3.CellSVD.algorithm: arpack
wandb:      params.3.CellSVD.n_components: 793
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122705-jhryorbj

Syncing run clear-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw

View run at https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj

[INFO][2025-08-20 12:27:16,705][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:27:16,707][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:27:17,099][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:27:17,243][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:27:17,247][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:27:17,424][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:27:17,425][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:27:17,426][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:27:17,427][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:27:17,428][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:27:17,430][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:27:17,430][dance][wrapped_func] Took 0:00:00.723811 to load and process data.
[INFO][2025-08-20 12:27:17,432][dance][_sanitize_params] Params plan:
[{'max_val': 98, 'min_val': 4, 'mode': 'sum'},
 {'eps': 0.1, 'mode': 'minmax'},
 {'method': 'scmap', 'num_genes': 7435},
 {'algorithm': 'arpack', 'n_components': 793},
 None]
[WARNING][2025-08-20 12:27:17,442][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:27:17,448][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:27:17,628][dance][_filter_scmap] Start generating cell features using scmap
[WARNING][2025-08-20 12:27:17,662][dance.CellSVD][__call__] n_components=793 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'

View run clear-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122705-jhryorbj/logs

Run jhryorbj errored:
Traceback (most recent call last):
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
    self._function()
  File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
    preprocessing_pipeline(data)
  File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
    return self.functional(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
    a(*args, **kwargs)
  File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
    return self.functional(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
    return original_call(self, data, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 275, in __call__
    cell_feat = svd.fit_transform(feat)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py", line 234, in fit_transform
    U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 438, in svds
    args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 44, in _iv
    raise ValueError(message)
ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.

wandb: ERROR Run jhryorbj errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 308, in _run_job
wandb: ERROR     self._function()
wandb: ERROR   File "/tmp/ipykernel_715844/3979844991.py", line 88, in evaluate_pipeline
wandb: ERROR     preprocessing_pipeline(data)
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR     return self.functional(*args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 247, in bounded_functional
wandb: ERROR     a(*args, **kwargs)
wandb: ERROR   File "/home/zyxing/dance/dance/pipeline.py", line 128, in __call__
wandb: ERROR     return self.functional(*args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/utils/wrappers.py", line 128, in new_call
wandb: ERROR     return original_call(self, data, *args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/dance/dance/transforms/cell_feature.py", line 275, in __call__
wandb: ERROR     cell_feat = svd.fit_transform(feat)
wandb: ERROR                 ^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
wandb: ERROR     data_to_wrap = f(self, X, *args, **kwargs)
wandb: ERROR                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
wandb: ERROR     return fit_method(estimator, *args, **kwargs)
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py", line 234, in fit_transform
wandb: ERROR     U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)
wandb: ERROR                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 438, in svds
wandb: ERROR     args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,
wandb: ERROR            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wandb: ERROR   File "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 44, in _iv
wandb: ERROR     raise ValueError(message)
wandb: ERROR ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.
wandb: ERROR
wandb: Agent Starting Run: 9bwt95uk with config:
wandb:      params.0.FilterGenesPercentile.max_val: 95
wandb:      params.0.FilterGenesPercentile.min_val: 6
wandb:      params.0.FilterGenesPercentile.mode: var
wandb:      params.1.ColumnSumNormalize.eps: 0.5
wandb:      params.1.ColumnSumNormalize.mode: normalize
wandb:      params.2.FilterGenesRegression.method: scmap
wandb:      params.2.FilterGenesRegression.num_genes: 1609
wandb:      params.3.CellSVD.algorithm: randomized
wandb:      params.3.CellSVD.n_components: 539
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.

wandb version 0.21.1 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.3

Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122733-9bwt95uk

Syncing run fallen-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw

View project at https://wandb.ai/xzy11632/dance-dev

View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw

View run at https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk

[INFO][2025-08-20 12:27:44,260][dance][set_seed] Setting global random seed to 10
[INFO][2025-08-20 12:27:44,262][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv
[INFO][2025-08-20 12:27:44,657][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv
[INFO][2025-08-20 12:27:44,802][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv
[INFO][2025-08-20 12:27:44,806][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv
/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
[INFO][2025-08-20 12:27:44,982][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088
[INFO][2025-08-20 12:27:44,983][dance][_load_raw_data] Number of training samples: 262
[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of valid samples: 66
[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of testing samples: 138
[INFO][2025-08-20 12:27:44,985][dance][_load_raw_data] Cell-types (n=9):
['OPC',
 'astrocytes',
 'endothelial',
 'fetal_quiescent',
 'fetal_replicating',
 'hybrid',
 'microglia',
 'neurons',
 'oligodendrocytes']
[INFO][2025-08-20 12:27:44,987][dance][load_data] Raw data loaded:
Data object that wraps (.data):
AnnData object with n_obs × n_vars = 466 × 22088
    uns: 'dance_config'
    obsm: 'cell_type'
[INFO][2025-08-20 12:27:44,988][dance][wrapped_func] Took 0:00:00.726977 to load and process data.
[INFO][2025-08-20 12:27:44,990][dance][_sanitize_params] Params plan:
[{'max_val': 95, 'min_val': 6, 'mode': 'var'},
 {'eps': 0.5, 'mode': 'normalize'},
 {'method': 'scmap', 'num_genes': 1609},
 {'algorithm': 'randomized', 'n_components': 539},
 None]
[WARNING][2025-08-20 12:27:45,000][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data
[WARNING][2025-08-20 12:27:45,005][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data
/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.
  warnings.warn("Expecting count data as input, but the input feature matrix does not appear to be count."
[INFO][2025-08-20 12:27:45,232][dance][_filter_scmap] Start generating cell features using scmap
[WARNING][2025-08-20 12:27:45,256][dance.CellSVD][__call__] n_components=539 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'
[INFO][2025-08-20 12:27:46,636][dance.CellSVD][__call__] Generating cell SVD features (466, 1609) (k=466)
[INFO][2025-08-20 12:27:46,639][dance.CellSVD][__call__] Top 10 explained variances: [0.01398817 0.01243401 0.00987709 0.00972114 0.00953149 0.00923472
 0.00884004 0.00859845 0.00861041 0.00851821]
[INFO][2025-08-20 12:27:46,640][dance.CellSVD][__call__] Total explained variance: 100.00%
[INFO][2025-08-20 12:27:46,641][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'
[INFO][2025-08-20 12:27:46,642][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'

Run history:

acc	▁
test_acc	▁
train_acc	▁

Run summary:

acc	0.36364
test_acc	0.06522
train_acc	0.67176

View run fallen-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20250820_122733-9bwt95uk/logs

3. Auto-Search Configuration

The configuration files (e.g., pipeline_params_tuning_config.yaml, pipeline_tuning_config.yaml, params_tuning_config.yaml) guide the auto-search. Each file contains instructions for how to vary your preprocessing pipeline or model hyperparameters (or both). For example:

#pipeline_params_tuning_config.yaml

type: preprocessor
tune_mode: pipeline_params
pipeline_tuning_top_k: 2
parameter_tuning_freq_n: 2
pipeline:
  - type: filter.gene
    include:
      - FilterGenesPercentile
      - FilterGenesScanpyOrder
      - FilterGenesPlaceHolder
    default_params:
      FilterGenesScanpyOrder:
          order: ["min_counts", "min_cells", "max_counts", "max_cells"]
          min_counts: 1
          max_counts: 134732
          min_cells: 1
          max_cells: 401
  - type: normalize
    include:
      - ScaleFeature
      - ScTransform
      - Log1P
      - NormalizeTotal
      - NormalizePlaceHolder
    default_params:
      ScTransform:
        processes_num: 8
  - type: filter.gene
    include:
      # - HighlyVariableGenesLogarithmizedByMeanAndDisp
      - HighlyVariableGenesRawCount
      - HighlyVariableGenesLogarithmizedByTopGenes
      - FilterGenesTopK
      - FilterGenesRegression
      # - FilterGenesNumberPlaceHolder
    default_params:
      FilterGenesTopK:
        num_genes: 100
      FilterGenesRegression:
        num_genes: 100
      HighlyVariableGenesRawCount:
        n_top_genes: 100
      HighlyVariableGenesLogarithmizedByTopGenes:
        n_top_genes: 100
  - type: feature.cell
    include:
      - WeightedFeaturePCA
      - WeightedFeatureSVD
      - CellPCA
      - CellSVD
      - GaussRandProjFeature  # Registered custom preprocessing func
      - FeatureCellPlaceHolder
    params:
      out: feature.cell
      log_level: INFO
  - type: misc
    target: SetConfig
    params:
      config_dict:
        feature_channel: feature.cell
        label_channel: cell_type
wandb:
  entity: xzy11632
  project: dance-dev
  method: grid #try grid to provide a comprehensive search
  metric:
    name: acc  # val/acc
    goal: maximize

Tips:

In tune_mode=pipeline, the system will only tune the preprocessing pipeline.
In tune_mode=params, the system will only tune the model parameters.
In tune_mode=pipeline_params, the system will do a two-stage search: first for pipelines, then for model parameters.

4. Testing & Execution

After setting everything up:

# Search only the best preprocessing pipeline:
python main.py --tune_mode pipeline

# Search only the best model hyperparameters:
python main.py --tune_mode params

# Joint two-stage search for both pipeline and parameters:
python main.py --tune_mode pipeline_params

Once this completes, you should see results logged into Weights & Biases (wandb). The save_summary_data function writes out a CSV of the top performing runs. If you selected pipeline_params, the script also generates a default param config for the second stage of the search, which is automatically run via run_step3.

5. Summary

By following these steps:

Inherit from the appropriate base class (in our case BaseClassificationMethod). Implement the fit, predict, and (optionally) preprocessing_pipeline methods. Integrate your custom model into the main.py script. Create and reference the necessary configuration (YAML) files. Run the pipeline using –tune_mode (pipeline|params|pipeline_params). …you can easily plug in any custom algorithm—ranging from simple classification methods like an SVM to deep learning methods with pretraining steps—into this auto-search framework.

Happy coding and good luck with your hyperparameter searches!

Tutorial: Plugging User-Designed Methods into DANCE 2.0 for Auto-Search

1. Folder Structure & Requirements

2. Defining Our SVM Classifier

3. Example main.py File

Run history:

Run summary:

Run history:

Run summary:

Run history:

Run summary:

Run history:

Run summary:

3. Auto-Search Configuration

4. Testing & Execution

5. Summary

3. Example `main.py` File