Working with Data
Many operations in ChromaQuant rely on DataSets, which are objects that contain either
one- or two-dimensional data and offer various methods to collect, set, and manipulate these
data. The simplest DataSet available to users through the API is the Value class, which
is used to instantiate objects that can each contain a singular datum. There is also a Table
class, which stores two-dimensional data using the pandas.DataFrame class.
Values
This class is primarily targeted at users who wish to work with values like floats, integers, strings, and booleans. A Value can be created like so:
my_value = cq.Value()
Data can be added to a Value in a few different ways. For one, data can be passed as an argument during instantiation:
some_data = 14.6
my_value = cq.Value(data=some_data)
Another way is to set the data attribute directly:
new_data = 31.9
my_value.data = new_data
The data attribute also allows for getting and deleting its contents:
# Get the current data
extracted_data = my_value.data
# Delete the current data
del my_value.data
Tables
Tables behave very similarly in some aspects to Values (they are both children of the DataSet class, after all!). The
key difference is that Tables are intended to contain two-dimensional data. The Table class does this by leveraging the
extensive functionality offered by the pandas.DataFrame class. We can create a Table in a similar way to a Value:
my_table = cq.Table()
We can assign data using either the data argument or the data attribute. First, we create a DataFrame using
pandas:
my_dictionary = {'Column A': [1, 2, 3], 'Column B': [4, 5, 6]}
my_dataframe = pandas.DataFrame(my_dictionary)
Then, we can assign this dataframe to our Table:
# Assign using the data argument
my_table = cq.Table(data=my_dataframe)
# Reassign by setting the data attribute
my_table.data = my_dataframe
As with Values, we can also get or delete data from Tables:
# Get the current data
extracted_data = my_table.data
# Delete the current data
del my_table.data
Something else we can do with Tables is read our data directly from .csv files:
# Create a new table
my_table = cq.Table()
# Import the data from a .csv file
my_table.import_csv_data('my_data.csv')
Back to our Example
Let’s return now to our scenario outlines in the previous page. Imagine we are given a sample of bergamot oil to analyze on our GC-MS system. After injecting the sample in the instrument, acquiring data, and performing best hits analysis by comparing collected mass spectra to library spectra, we have the following table of MS results:
Peak Number |
Retention Time (min) |
Compound |
Formula |
|---|---|---|---|
1 |
18.23 |
tricyclene |
C10H16 |
2 |
18.81 |
α-thujene |
C10H16 |
3 |
19.26 |
α-pinene |
C10H16 |
4 |
20.12 |
camphene |
C10H16 |
5 |
21.75 |
sabinene |
C10H16 |
6 |
21.89 |
ß-pinene |
C10H16 |
7 |
22.44 |
6-methyl-5-hepten-2-one |
C8H14O |
8 |
22.61 |
myrcene |
C10H16 |
9 |
23.26 |
octanal |
C8H16O |
10 |
23.31 |
α-phellandrene |
C10H16 |
11 |
24.01 |
δ-3-carene |
C10H16 |
12 |
24.21 |
α-terpinene |
C10H16 |
13 |
24.89 |
p-cymene |
C10H14 |
14 |
25.06 |
limonene |
C10H16 |
15 |
25.13 |
1,8-cineole |
C10H18O |
16 |
25.36 |
(Z)-ß-ocimene |
C10H16 |
17 |
26.00 |
(E)-ß-ocimene |
C10H16 |
18 |
26.81 |
γ-terpinene |
C10H16 |
19 |
27.09 |
cis-sabinene hydrate |
C10H18O |
20 |
27.78 |
octanol |
C8H18O |
21 |
28.43 |
terpinolene |
C10H16 |
22 |
29.11 |
linalool |
C10H18O |
23 |
29.34 |
nonanal |
C9H18O |
24 |
29.72 |
heptyl acetate |
C9H18O2 |
25 |
29.98 |
cis-limonene oxide |
C10H16O |
26 |
38.53 |
linalyl acetate |
C12H20O2 |
27 |
54.49 |
ß-bisabolene |
C15H24 |
We also get the following table of FID integration results:
Peak Number |
Retention Time (min) |
Area |
|---|---|---|
1 |
18.24 |
9.642202419 |
2 |
18.79 |
721.0011595 |
3 |
19.30 |
2626.975549 |
4 |
20.10 |
80.28715918 |
5 |
20.36 |
13.20155492 |
6 |
21.71 |
2599.987859 |
7 |
21.90 |
16464.64891 |
8 |
22.48 |
9.540131909 |
9 |
22.61 |
2383.918994 |
10 |
23.22 |
67.72271269 |
11 |
23.27 |
84.84936771 |
12 |
24.04 |
11.54458953 |
13 |
24.23 |
381.0284691 |
14 |
24.88 |
954.0299001 |
15 |
25.02 |
79746.12728 |
16 |
25.10 |
32.72194488 |
17 |
25.33 |
92.3024417 |
18 |
25.48 |
8.178220039 |
19 |
25.97 |
476.8954081 |
20 |
26.85 |
17449.17202 |
21 |
27.11 |
94.59826749 |
22 |
27.76 |
20.20661954 |
23 |
28.41 |
746.6239539 |
24 |
29.15 |
23250.14097 |
25 |
29.39 |
79.28369498 |
26 |
29.71 |
7.280006075 |
27 |
29.83 |
3.327113495 |
28 |
30.01 |
22.03782356 |
29 |
38.51 |
66953.91378 |
30 |
54.47 |
1183.234388 |
Let’s start with the integration table. We can create a Table instance in Python by first adding our data to a DataFrame:
# Define a dictionary with our data as tabulated above
fid_dictionary = {'Peak Number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'Retention Time': [18.24, 18.79, 19.30, 20.10, 20.36, 21.71, 21.90, 22.48,
22.61, 23.22, 23.27, 24.04, 24.23, 24.88, 25.02, 25.10,
25.33, 25.48, 25.97, 26.85, 27.11, 27.76, 28.41, 29.15,
29.39, 29.71, 29.83, 30.01, 38.51, 54.47],
'Area': [9.642202419, 721.0011595, 2626.975549, 80.28715918, 13.20155492, 2599.987859
16464.64891, 9.540131909, 2383.918994, 67.72271269, 84.84936771, 11.54458953,
381.0284691, 954.0299001, 79746.12728, 32.72194488, 92.3024417, 8.178220039,
476.8954081, 17449.17202, 94.59826749, 20.20661954, 746.6239539, 23250.14097
79.28369498, 7.280006075, 3.327113495, 22.03782356, 66953.91378, 1183.234388]}
# Create a DataFrame from the dictionary
fid_dataframe = pandas.DataFrame(my_dictionary)
# Create a Table with the data set to the DataFrame
fid_table = cq.Table(data=fid_dataframe)
Alternatively, we could read the data directly from a .csv file if it is in that format. Let’s do that with our MS data:
# Create a new table
ms_table = cq.Table()
# Import the data from a .csv file
ms_table.import_csv_data('ms_data.csv')
Adding Complexity
This is just the tip of the iceberg for these classes. There are several additional layers of functionality—including formula assignment and reporting—that will be expanded upon further in the Getting Started section. Please continue on to see how you can integrate this simple data manipulation in a more complex analysis.