maxfuse.model.Fusor.split_into_batches

Fusor.split_into_batches(max_outward_size=5000, matching_ratio=3, metacell_size=2, method='random', batching_scheme='pairwise', prebatching_smoothing=False, shared_wt1=1, shared_wt2=1, active_wt1=1, active_wt2=1, shared_svd_components1=None, shared_svd_components2=None, active_svd_components1=None, active_svd_components2=None, randomized_svd=False, svd_runs=1, seed=None, verbose=True)[source]

Split the data into batches.

Parameters:
  • max_outward_size (int, default=10000) – Max number of cells to match in arr1.

  • matching_ratio (int, default=3) – One cell in arr1 is matched to how many cells in arr2 on average.

  • metacell_size (int, default=2) – For arr1, how many cells will be aggregated into one metacell on average.

  • method (str, default='random') – Either ‘random’, doing random split, or ‘binning’, doing binning by largest singular vector.

  • batching_scheme (str, default='pairwise') –

    Either ‘cyclic’ or ‘pairwise’

    if cyclic, pair batches in the two datasets in a cyclic fashion if pairwise, all possible combinations of pairs of batches are considered

  • prebatching_smoothing (bool, default=False) – Whether to smooth towards the low rank approximation before batching

  • shared_wt1 (float, default=1) – The shrinkage weight to put on the raw data for shared_arr1.

  • shared_wt2 (float, default=1) – The shrinkage weight to put on the raw data for shared_arr2.

  • active_wt1 (float, default=1) – The shrinkage weight to put on the raw data for active_arr1.

  • active_wt2 (float, default=1) – The shrinkage weight to put on the raw data for active_arr2.

  • shared_svd_components1 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.shared_arr1.

  • shared_svd_components2 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.shared_arr2.

  • active_svd_components1 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.active_arr1.

  • active_svd_components2 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.active_arr2.

  • randomized_svd (bool, default=False) – Whether to use randomized SVD.

  • svd_runs (int, default=1) – Perform multiple runs of SVD and the one with lowest Frobenious reconstruction error is selected.

  • seed (None or int, default=None) – Numpy random seed.

  • verbose (bool, default=True) – Whether to print the progress.

Returns:

None