MatchConfig

This submodule contains the class definition for MatchConfig. Objects of type MatchConfig are intended to contain user-specified parameters for DataFrame matching (see match.py). Users can specify parameters such as columns to include in a primary DataFrame from a second (import_include_col) or where to output data (output_path). These objects can then be passed to the match function from match.py or Table.match from a Table object.

class chromaquant.match.match_config.MatchConfig(do_export: bool = False, import_include_col: list[str] | None = None, local_filter_row: dict[str, str | bool | float | int] | None = None, match_conditions: list[dict[str, Any]] | None = None, multiple_hits_rule: Callable[[Any, DataFrame, str, float | int, bool], Series] | None = None, multiple_hits_column: str = '', output_cols_dict: dict[str, str] | None = None, output_path: str = 'match_results.csv')

Class used to define how data from two Pandas DataFrames should be matched.

Parameters

do_exportbool, optional

True if match results should be exported to .csv, by default False.

import_include_collist[str] | None, optional

List of columns to include in second DataFrame in addition to columns from first DataFrame, by default None.

local_filter_rowdict[str, str | bool | float | int] | None,

optional

Dictonary containing name of column used to filter first dataframe as key and row value to filter by as value, by default None

match_conditionslist[dict[str, Any]] | None, optional

List of conditions by which to match the dataframes (See Notes), by default None

multiple_hits_ruleCallable[[DataFrame, str], Series] | None,

optional

Function that selects one Series (hit) from a DataFrame (multiple hits) with some built-in options like “SELECT_FIRST_ROW”, by default None

multiple_hits_columnstr, optional

Name of column by which to apply the multiple hits rule, by default ‘’

output_cols_dictdict[str, str] | None, optional

Dictionary containing keys set to column names as written in matched datasets and values set to column names as desired in output DataFrame, by default None

output_pathstr, optional

Path to output file including file name and extension, by default ‘match_results.csv’

Raises

ValueError

If more than two strings are passed in a list for the comparison parameter when adding a match condition in add_match_condition.

Notes

The expected structure of match_conditions is as follows:

[{
    'condition': cq.MatchConfig.IS_EQUAL,
    'first_DF_column': str,
    'second_DF_column': str,
    'kwargs': {
        'error': float (optional),
        'or_equal': bool (optional),
        'value_function': Callable (optional)
        }
    },
...]

The condition can be replaced with GREATER_THAN, LESS_THAN, or any user-defined function with the same arguments and return pattern.

static FUNCTION_OF(value: Any, DF: DataFrame, DF_column_name: str, value_function: Callable[[Any], Any], error: float | int = 0) DataFrame

Returns slice of a DataFrame where a passed value is a function of one of its column’s values.

Parameters

valueAny

A value of any type, checked if a function of a DataFrame’s values.

DFPandas DataFrame

A Pandas DataFrame to compare against value.

DF_column_namestr

The name of the column in the DataFrame whose values are compared against value.

value_functionCallable[[Any], Any]

A function that accepts a DataFrame’s value and returns a value that should be equal to some passed value.

errorfloat | int, optional

A float or integer defining acceptable error for float or integer value, by default 0.

Returns

pd.DataFrame

Slice of DataFrame where some value in a given column passed through a function is equal to a passed value.

static GREATER_THAN(value: Any, DF: DataFrame, DF_column_name: str, or_equal: bool = False) DataFrame

Returns slice of a Dataframe where a passed value is greater than one of its column’s values.

Parameters

valueAny

A value of any type, checked whether greater than any rows in DF.

DFPandas DataFrame

A Pandas DataFrame to compare against value.

DF_column_namestr

The name of the column in the DataFrame whose values are compared against value.

or_equalbool, optional

True if value can be equal to values in DataFrame column, by default False.

Returns

pd.DataFrame

Slice of DataFrame where values in a given column are less than a given value.

static IS_EQUAL(value: Any, DF: DataFrame, DF_column_name: str, error: float | int = 0) DataFrame

Returns slice of a Dataframe where one of its column’s values are equal to some value.

Parameters

valueAny

A value of any type, checked whether equal to any rows in DF.

DFPandas DataFrame

A Pandas DataFrame to compare against value.

DF_column_namestr

The name of the column in the DataFrame whose values are compared against value.

errorfloat | int, optional

A float or integer defining acceptable error for float or integer value, by default 0.

Returns

pd.DataFrame

Slice of DataFrame where values in a given column are equal to a given value, optionally within a given error.

static LESS_THAN(value: Any, DF: DataFrame, DF_column_name: str, or_equal: bool = False) DataFrame

Returns slice of a Dataframe where a passed value is less than one of its column’s values.

Parameters

valueAny

A value of any type, checked whether less than any rows in DF.

DFPandas DataFrame

A Pandas DataFrame to compare against value.

DF_column_namestr

The name of the column in the DataFrame whose values are compared against value.

or_equalbool, optional

True if value can be equal to values in DataFrame column, by default False.

Returns

pd.DataFrame

Slice of DataFrame where values in a given column are greater than a given value.

static SELECT_FIRST_ROW(DF: DataFrame, column_name: str) Series

Multiple hits rule to select first row of DataFrame.

Parameters

DFpd.DataFrame

DataFrame to apply multiple hits rule to.

column_namestr

Name of column to consider in rule.

Returns

pd.Series

A row from the passed DF.

static SELECT_HIGHEST_VALUE(DF: DataFrame, column_name: str) Series

Multiple hits rule to select row of DataFrame where column has highest value

Parameters

DFpd.DataFrame

DataFrame to apply multiple hits rule to.

column_namestr

Name of column to consider in rule.

Returns

pd.Series

A row from the passed DF.

static SELECT_LOWEST_VALUE(DF: DataFrame, column_name: str) Series

Multiple hits rule to select row of DataFrame where column has lowest value

Parameters

DFpd.DataFrame

DataFrame to apply multiple hits rule to.

column_namestr

Name of column to consider in rule.

Returns

pd.Series

A row from the passed DF.

add_match_condition(condition: Callable[[DataFrame, str], Series], comparison: str | list[str], kwargs: dict[str, Any] = {})

Adds a new match condition to the MatchConfig instance.

Parameters

conditionCallable(Any, pd.DataFrame, str, float | int, bool) -> pd.Series

A condition that accepts a comparison value of any type, a DataFrame to compare the value against, the name of the column containing values to compare to the comparison value, and optional parameters for the error and whether to use inclusive inequalities (e.g., greater than or equal to), respectively.

comparisonstr or list[str]

The name of the columns to compare across two DataFrames (if the name of the column is the same for both) or a list of two column names to compare (if the column names are different).

kwargsdict[str, Any]

A dictionary of additional keyword arguments to pass to the match condition. See each match condition option for applicable keywords.

Returns

None

do_export: bool
import_include_col: list[str]
local_filter_row: dict[str, str | bool | float | int]
match_conditions: list[Any]
multiple_hits_column: str
multiple_hits_rule: Callable[[DataFrame, str], Series]
output_cols_dict: dict
output_path: str