maxfuse.model.Fusor.split_into_batches
- Fusor.split_into_batches(max_outward_size=5000, matching_ratio=3, metacell_size=2, method='random', batching_scheme='pairwise', prebatching_smoothing=False, shared_wt1=1, shared_wt2=1, active_wt1=1, active_wt2=1, shared_svd_components1=None, shared_svd_components2=None, active_svd_components1=None, active_svd_components2=None, randomized_svd=False, svd_runs=1, seed=None, verbose=True)[source]
Split the data into batches.
- Parameters:
max_outward_size (int, default=10000) – Max number of cells to match in arr1.
matching_ratio (int, default=3) – One cell in arr1 is matched to how many cells in arr2 on average.
metacell_size (int, default=2) – For arr1, how many cells will be aggregated into one metacell on average.
method (str, default='random') – Either ‘random’, doing random split, or ‘binning’, doing binning by largest singular vector.
batching_scheme (str, default='pairwise') –
- Either ‘cyclic’ or ‘pairwise’
if cyclic, pair batches in the two datasets in a cyclic fashion if pairwise, all possible combinations of pairs of batches are considered
prebatching_smoothing (bool, default=False) – Whether to smooth towards the low rank approximation before batching
shared_wt1 (float, default=1) – The shrinkage weight to put on the raw data for shared_arr1.
shared_wt2 (float, default=1) – The shrinkage weight to put on the raw data for shared_arr2.
active_wt1 (float, default=1) – The shrinkage weight to put on the raw data for active_arr1.
active_wt2 (float, default=1) – The shrinkage weight to put on the raw data for active_arr2.
shared_svd_components1 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.shared_arr1.
shared_svd_components2 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.shared_arr2.
active_svd_components1 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.active_arr1.
active_svd_components2 (None or int, default=None) – If not None, perform SVD to reduce the dimension of self.active_arr2.
randomized_svd (bool, default=False) – Whether to use randomized SVD.
svd_runs (int, default=1) – Perform multiple runs of SVD and the one with lowest Frobenious reconstruction error is selected.
seed (None or int, default=None) – Numpy random seed.
verbose (bool, default=True) – Whether to print the progress.
- Returns:
None