maxfuse.match_utils.get_refined_matching_one_iter
- maxfuse.match_utils.get_refined_matching_one_iter(init_matching, arr1, arr2, clust_labels1=None, clust_labels2=None, edges1=None, edges2=None, wt1=0.5, wt2=0.5, filter_prop=0, cca_components=15, cca_max_iter=2000, verbose=True)[source]
Run one iteration of CCA refinement.
- Parameters:
init_matching (list) – init_matching[0][i], init_matching[1][i] is a matched pair, and init_matching[2][i] is the distance for this pair
arr1 (np.array of shape (n_samples1, n_features1)) – The first data matrix.
arr2 (np.array of shape (n_samples2, n_features2)) – The second data matrix.
clust_labels1 (None or np.array of shape (n_samples1, )) – If not None, then it is the clustering label of the first data matrix, and the smoothing of this matrix will be done via cluster centroid shrinkage.
clust_labels2 (None or np.array of shape (n_samples2, )) – Same as clust_labels1 but for the second data matrix.
edges1 (None or list of length two or three) – If not None, then each edge in the graph is (edges[0][i], edges[1][i]) with weight edges[2][i] (if exists) and the smoothing of this matrix will be done via graph smoothing.
edges2 (None or scipy.sparse.csr_matrix of shape (n_samples2, n_samples2)) – Same as edges1 but for the second data matrix.
wt1 (float, default=0.5) – The smoothing of the first data matrix will be wt1 * (cca embedding of arr1) + (1-wt1) * shrinkage_targets, where the shrinkage_targets will be either the cluster centroids or the average of graph neighbors.
wt2 (float, default=0.5) – Same as wt1 but for the second data matrix.
filter_prop (float, default=0) – Proportion of matched pairs to discard before feeding into refinement iterations.
cca_components (int, default=15) – Number of CCA components.
cca_max_iter (int, default=2000) – Maximum number of CCA iterations.
verbose (bool, default=True) – Whether to print the
- Returns:
rows, cols, vals (list) – Each matched pair of rows[i], cols[i], their distance is vals[i]