Working with Data

Many operations in ChromaQuant rely on DataSets, which are objects that contain either one- or two-dimensional data and offer various methods to collect, set, and manipulate these data. The simplest DataSet available to users through the API is the Value class, which is used to instantiate objects that can each contain a singular datum. There is also a Table class, which stores two-dimensional data using the pandas.DataFrame class.

Values

This class is primarily targeted at users who wish to work with values like floats, integers, strings, and booleans. A Value can be created like so:

my_value = cq.Value()

Data can be added to a Value in a few different ways. For one, data can be passed as an argument during instantiation:

some_data = 14.6
my_value = cq.Value(data=some_data)

Another way is to set the data attribute directly:

new_data = 31.9
my_value.data = new_data

The data attribute also allows for getting and deleting its contents:

# Get the current data
extracted_data = my_value.data
# Delete the current data
del my_value.data

Tables

Tables behave very similarly in some aspects to Values (they are both children of the DataSet class, after all!). The key difference is that Tables are intended to contain two-dimensional data. The Table class does this by leveraging the extensive functionality offered by the pandas.DataFrame class. We can create a Table in a similar way to a Value:

my_table = cq.Table()

We can assign data using either the data argument or the data attribute. First, we create a DataFrame using pandas:

my_dictionary = {'Column A': [1, 2, 3], 'Column B': [4, 5, 6]}
my_dataframe = pandas.DataFrame(my_dictionary)

Then, we can assign this dataframe to our Table:

# Assign using the data argument
my_table = cq.Table(data=my_dataframe)
# Reassign by setting the data attribute
my_table.data = my_dataframe

As with Values, we can also get or delete data from Tables:

# Get the current data
extracted_data = my_table.data
# Delete the current data
del my_table.data

Something else we can do with Tables is read our data directly from .csv files:

# Create a new table
my_table = cq.Table()
# Import the data from a .csv file
my_table.import_csv_data('my_data.csv')

Back to our Example

Let’s return now to our scenario outlines in the previous page. Imagine we are given a sample of bergamot oil to analyze on our GC-MS system. After injecting the sample in the instrument, acquiring data, and performing best hits analysis by comparing collected mass spectra to library spectra, we have the following table of MS results:

Peak Number	Retention Time (min)	Compound	Formula
1	18.23	tricyclene	C10H16
2	18.81	α-thujene	C10H16
3	19.26	α-pinene	C10H16
4	20.12	camphene	C10H16
5	21.75	sabinene	C10H16
6	21.89	ß-pinene	C10H16
7	22.44	6-methyl-5-hepten-2-one	C8H14O
8	22.61	myrcene	C10H16
9	23.26	octanal	C8H16O
10	23.31	α-phellandrene	C10H16
11	24.01	δ-3-carene	C10H16
12	24.21	α-terpinene	C10H16
13	24.89	p-cymene	C10H14
14	25.06	limonene	C10H16
15	25.13	1,8-cineole	C10H18O
16	25.36	(Z)-ß-ocimene	C10H16
17	26.00	(E)-ß-ocimene	C10H16
18	26.81	γ-terpinene	C10H16
19	27.09	cis-sabinene hydrate	C10H18O
20	27.78	octanol	C8H18O
21	28.43	terpinolene	C10H16
22	29.11	linalool	C10H18O
23	29.34	nonanal	C9H18O
24	29.72	heptyl acetate	C9H18O2
25	29.98	cis-limonene oxide	C10H16O
26	38.53	linalyl acetate	C12H20O2
27	54.49	ß-bisabolene	C15H24

We also get the following table of FID integration results:

Peak Number	Retention Time (min)	Area
1	18.24	9.642202419
2	18.79	721.0011595
3	19.30	2626.975549
4	20.10	80.28715918
5	20.36	13.20155492
6	21.71	2599.987859
7	21.90	16464.64891
8	22.48	9.540131909
9	22.61	2383.918994
10	23.22	67.72271269
11	23.27	84.84936771
12	24.04	11.54458953
13	24.23	381.0284691
14	24.88	954.0299001
15	25.02	79746.12728
16	25.10	32.72194488
17	25.33	92.3024417
18	25.48	8.178220039
19	25.97	476.8954081
20	26.85	17449.17202
21	27.11	94.59826749
22	27.76	20.20661954
23	28.41	746.6239539
24	29.15	23250.14097
25	29.39	79.28369498
26	29.71	7.280006075
27	29.83	3.327113495
28	30.01	22.03782356
29	38.51	66953.91378
30	54.47	1183.234388

Let’s start with the integration table. We can create a Table instance in Python by first adding our data to a DataFrame:

# Define a dictionary with our data as tabulated above
fid_dictionary = {'Peak Number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                                 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
                 'Retention Time': [18.24, 18.79, 19.30, 20.10, 20.36, 21.71, 21.90, 22.48,
                                    22.61, 23.22, 23.27, 24.04, 24.23, 24.88, 25.02, 25.10,
                                    25.33, 25.48, 25.97, 26.85, 27.11, 27.76, 28.41, 29.15,
                                    29.39, 29.71, 29.83, 30.01, 38.51, 54.47],
                 'Area': [9.642202419, 721.0011595, 2626.975549, 80.28715918, 13.20155492, 2599.987859
                          16464.64891, 9.540131909, 2383.918994, 67.72271269, 84.84936771, 11.54458953,
                          381.0284691, 954.0299001, 79746.12728, 32.72194488, 92.3024417, 8.178220039,
                          476.8954081, 17449.17202, 94.59826749, 20.20661954, 746.6239539, 23250.14097
                          79.28369498, 7.280006075, 3.327113495, 22.03782356, 66953.91378, 1183.234388]}

# Create a DataFrame from the dictionary
fid_dataframe = pandas.DataFrame(my_dictionary)

# Create a Table with the data set to the DataFrame
fid_table = cq.Table(data=fid_dataframe)

Alternatively, we could read the data directly from a .csv file if it is in that format. Let’s do that with our MS data:

# Create a new table
ms_table = cq.Table()
# Import the data from a .csv file
ms_table.import_csv_data('ms_data.csv')

Adding Complexity

This is just the tip of the iceberg for these classes. There are several additional layers of functionality—including formula assignment and reporting—that will be expanded upon further in the Getting Started section. Please continue on to see how you can integrate this simple data manipulation in a more complex analysis.