maxfuse.match_utils.get_refined_matching

maxfuse.match_utils.get_refined_matching(init_matching, arr1, arr2, randomized_svd=False, svd_runs=1, svd_components1=None, svd_components2=None, clust_labels1=None, clust_labels2=None, edges1=None, edges2=None, wt1=0.5, wt2=0.5, n_iters=3, filter_prop=0, cca_components=15, cca_max_iter=2000, verbose=True)[source]

Refinement of init_matching.

Parameters:

init_matching (list) – init_matching[0][i], init_matching[1][i] is a matched pair, and init_matching[2][i] is the distance for this pair.
arr1 (np.array of shape (n_samples1, n_features1)) – The first data matrix.
arr2 (np.array of shape (n_samples2, n_features2)) – The second data matrix.
randomized_svd (bool, default=False) – Whether to use randomized SVD
svd_runs (int, default=1) – Randomized SVD will result in different runs, so if randomized_svd=True, perform svd_runs many randomized SVDs, and pick the one with the smallest Frobenious reconstruction error. If randomized_svd=False, svd_runs is forced to be 1.
svd_components1 (None or int) – If None, then do not do SVD, else, number of components to keep when doing SVD de-noising for the first data matrix before feeding into CCA.
svd_components2 (None or int) – Same as svd_components1 but for the second data matrix.
clust_labels1 (None or np.array of shape (n_samples1, )) – If not None, then it is the clustering label of the first data matrix, and the smoothing of this matrix will be done via cluster centroid shrinkage.
clust_labels2 (None or np.array of shape (n_samples2, )) – Same as clust_labels1 but for the second data matrix.
edges1 (None or list of length two or three) – If not None, then each edge in the graph is (edges[0][i], edges[1][i]) with weight edges[2][i] (if exists) and the smoothing of this matrix will be done via graph smoothing.
edges2 (None or scipy.sparse.csr_matrix of shape (n_samples2, n_samples2)) – Same as edges1 but for the second data matrix.
wt1 (float, default=0.5) – The smoothing of the first data matrix will be wt1 * (cca embedding of arr1) + (1-wt1) * shrinkage_targets, where the shrinkage_targets will be either the cluster centroids or the average of graph neighbors.
wt2 (float, default=0.5) – Same as wt1 but for the second data matrix.
n_iters (int, default=3) – Number of refinement iterations.
filter_prop (float, default=0) – Proportion of matched pairs to discard before feeding into refinement iterations.
cca_components (int, default=15) – Number of CCA components.
cca_max_iter (int, default=2000,) – Maximum number of CCA iterations.
verbose (bool, default=True) – Whether to print the progress.

Returns:

matching (list of length 3) – rows, cols, vals = matching, Each matched pair is rows[i], cols[i], their distance is vals[i].