Working with Data

Many operations in ChromaQuant rely on DataSets, which are objects that contain either one- or two-dimensional data and offer various methods to collect, set, and manipulate these data. The simplest DataSet available to users through the API is the Value class, which is used to instantiate objects that can each contain a singular datum. There is also a Table class, which stores two-dimensional data using the pandas.DataFrame class.

Values

This class is primarily targeted at users who wish to work with values like floats, integers, strings, and booleans. A Value can be created like so:

my_value = cq.Value()

Data can be added to a Value in a few different ways. For one, data can be passed as an argument during instantiation:

some_data = 14.6
my_value = cq.Value(data=some_data)

Another way is to set the data attribute directly:

new_data = 31.9
my_value.data = new_data

The data attribute also allows for getting and deleting its contents:

# Get the current data
extracted_data = my_value.data
# Delete the current data
del my_value.data

Tables

Tables behave very similarly in some aspects to Values (they are both children of the DataSet class, after all!). The key difference is that Tables are intended to contain two-dimensional data. The Table class does this by leveraging the extensive functionality offered by the pandas.DataFrame class. We can create a Table in a similar way to a Value:

my_table = cq.Table()

We can assign data using either the data argument or the data attribute. First, we create a DataFrame using pandas:

my_dictionary = {'Column A': [1, 2, 3], 'Column B': [4, 5, 6]}
my_dataframe = pandas.DataFrame(my_dictionary)

Then, we can assign this dataframe to our Table:

# Assign using the data argument
my_table = cq.Table(data=my_dataframe)
# Reassign by setting the data attribute
my_table.data = my_dataframe

As with Values, we can also get or delete data from Tables:

# Get the current data
extracted_data = my_table.data
# Delete the current data
del my_table.data

Something else we can do with Tables is read our data directly from .csv files:

# Create a new table
my_table = cq.Table()
# Import the data from a .csv file
my_table.import_csv_data('my_data.csv')

Back to our Example

Let’s return now to our scenario outlines in the previous page. Imagine we are given a sample of bergamot oil to analyze on our GC-MS system. After injecting the sample in the instrument, acquiring data, and performing best hits analysis by comparing collected mass spectra to library spectra, we have the following table of MS results:

Peak Number

Retention Time (min)

Compound

Formula

1

18.23

tricyclene

C10H16

2

18.81

α-thujene

C10H16

3

19.26

α-pinene

C10H16

4

20.12

camphene

C10H16

5

21.75

sabinene

C10H16

6

21.89

ß-pinene

C10H16

7

22.44

6-methyl-5-hepten-2-one

C8H14O

8

22.61

myrcene

C10H16

9

23.26

octanal

C8H16O

10

23.31

α-phellandrene

C10H16

11

24.01

δ-3-carene

C10H16

12

24.21

α-terpinene

C10H16

13

24.89

p-cymene

C10H14

14

25.06

limonene

C10H16

15

25.13

1,8-cineole

C10H18O

16

25.36

(Z)-ß-ocimene

C10H16

17

26.00

(E)-ß-ocimene

C10H16

18

26.81

γ-terpinene

C10H16

19

27.09

cis-sabinene hydrate

C10H18O

20

27.78

octanol

C8H18O

21

28.43

terpinolene

C10H16

22

29.11

linalool

C10H18O

23

29.34

nonanal

C9H18O

24

29.72

heptyl acetate

C9H18O2

25

29.98

cis-limonene oxide

C10H16O

26

38.53

linalyl acetate

C12H20O2

27

54.49

ß-bisabolene

C15H24

We also get the following table of FID integration results:

Peak Number

Retention Time (min)

Area

1

18.24

9.642202419

2

18.79

721.0011595

3

19.30

2626.975549

4

20.10

80.28715918

5

20.36

13.20155492

6

21.71

2599.987859

7

21.90

16464.64891

8

22.48

9.540131909

9

22.61

2383.918994

10

23.22

67.72271269

11

23.27

84.84936771

12

24.04

11.54458953

13

24.23

381.0284691

14

24.88

954.0299001

15

25.02

79746.12728

16

25.10

32.72194488

17

25.33

92.3024417

18

25.48

8.178220039

19

25.97

476.8954081

20

26.85

17449.17202

21

27.11

94.59826749

22

27.76

20.20661954

23

28.41

746.6239539

24

29.15

23250.14097

25

29.39

79.28369498

26

29.71

7.280006075

27

29.83

3.327113495

28

30.01

22.03782356

29

38.51

66953.91378

30

54.47

1183.234388

Let’s start with the integration table. We can create a Table instance in Python by first adding our data to a DataFrame:

# Define a dictionary with our data as tabulated above
fid_dictionary = {'Peak Number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                                 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
                 'Retention Time': [18.24, 18.79, 19.30, 20.10, 20.36, 21.71, 21.90, 22.48,
                                    22.61, 23.22, 23.27, 24.04, 24.23, 24.88, 25.02, 25.10,
                                    25.33, 25.48, 25.97, 26.85, 27.11, 27.76, 28.41, 29.15,
                                    29.39, 29.71, 29.83, 30.01, 38.51, 54.47],
                 'Area': [9.642202419, 721.0011595, 2626.975549, 80.28715918, 13.20155492, 2599.987859
                          16464.64891, 9.540131909, 2383.918994, 67.72271269, 84.84936771, 11.54458953,
                          381.0284691, 954.0299001, 79746.12728, 32.72194488, 92.3024417, 8.178220039,
                          476.8954081, 17449.17202, 94.59826749, 20.20661954, 746.6239539, 23250.14097
                          79.28369498, 7.280006075, 3.327113495, 22.03782356, 66953.91378, 1183.234388]}

# Create a DataFrame from the dictionary
fid_dataframe = pandas.DataFrame(my_dictionary)

# Create a Table with the data set to the DataFrame
fid_table = cq.Table(data=fid_dataframe)

Alternatively, we could read the data directly from a .csv file if it is in that format. Let’s do that with our MS data:

# Create a new table
ms_table = cq.Table()
# Import the data from a .csv file
ms_table.import_csv_data('ms_data.csv')

Adding Complexity

This is just the tip of the iceberg for these classes. There are several additional layers of functionality—including formula assignment and reporting—that will be expanded upon further in the Getting Started section. Please continue on to see how you can integrate this simple data manipulation in a more complex analysis.