MCPcopy
hub / github.com/AutoViML/AutoViz / find_corr_vars

Function find_corr_vars

autoviz/AutoViz_Utils.py:1947–1966  ·  view source on GitHub ↗

This returns a dictionary of counts of each variable and how many vars it is correlated to in the dataframe

(correlation_dataframe, corr_limit=0.70)

Source from the content-addressed store, hash-verified

1945
1946
1947def find_corr_vars(correlation_dataframe, corr_limit=0.70):
1948 """
1949 This returns a dictionary of counts of each variable and how many vars it is correlated to in the dataframe
1950 """
1951 flatten = lambda l: [item for sublist in l for item in sublist]
1952 flatten_items = lambda dic: [x for x in dic.items()]
1953 a = correlation_dataframe.values
1954 col_index = correlation_dataframe.columns.tolist()
1955 index_triupper = list(zip(np.triu_indices_from(a, k=1)[0], np.triu_indices_from(a, k=1)[1]))
1956 high_corr_index_list = [x for x in np.argwhere(abs(a[np.triu_indices(len(a), k=1)]) >= corr_limit)]
1957 tuple_list = [y for y in [index_triupper[x[0]] for x in high_corr_index_list]]
1958 correlated_pair = [(col_index[tup[0]], col_index[tup[1]]) for tup in tuple_list]
1959 correlated_pair_dict = dict(correlated_pair)
1960 flat_corr_pair_list = [item for sublist in correlated_pair for item in sublist]
1961 #### You can make it a dictionary or a tuple of lists. We have chosen the latter here to keep order intact.
1962 # corr_pair_count_dict = Counter(flat_corr_pair_list)
1963 corr_pair_count_dict = count_freq_in_list(flat_corr_pair_list)
1964 corr_list = list(set(flatten(flatten_items(correlated_pair_dict))))
1965 rem_col_list = left_subtract(list(correlation_dataframe), list(OrderedDict.fromkeys(flat_corr_pair_list)))
1966 return corr_pair_count_dict, rem_col_list, corr_list, correlated_pair_dict
1967
1968
1969#################################################################################

Callers

nothing calls this directly

Calls 2

count_freq_in_listFunction · 0.85
left_subtractFunction · 0.70

Tested by

no test coverage detected