.. _working-with-data:

Working with Data
===============================
.. testsetup:: *

    import chromaquant as cq

Many operations in ChromaQuant rely on :code:`DataSets`, which are objects that contain either
one- or two-dimensional data and offer various methods to collect, set, and manipulate these
data. The simplest DataSet available to users through the API is the :code:`Value` class, which
is used to instantiate objects that can each contain a singular datum. There is also a :code:`Table`
class, which stores two-dimensional data using the :code:`pandas.DataFrame` class.

Values
-------------------------------

This class is primarily targeted at users who wish to work with values like floats, integers, strings, and booleans.
A Value can be created like so:

.. code-block:: python

    my_value = cq.Value()

Data can be added to a Value in a few different ways. For one, data can be passed as an argument
during instantiation:

.. code-block:: python

    some_data = 14.6
    my_value = cq.Value(data=some_data)

Another way is to set the :code:`data` attribute directly:

.. code-block:: python

    new_data = 31.9
    my_value.data = new_data

The :code:`data` attribute also allows for getting and deleting its contents:

.. code-block:: python

    # Get the current data
    extracted_data = my_value.data
    # Delete the current data
    del my_value.data

Tables
-------------------------------

Tables behave very similarly in some aspects to Values (they are both children of the DataSet class, after all!). The
key difference is that Tables are intended to contain two-dimensional data. The Table class does this by leveraging the
extensive functionality offered by the :code:`pandas.DataFrame` class. We can create a Table in a similar way to a Value:

.. code-block:: python

    my_table = cq.Table()

We can assign data using either the :code:`data` argument or the :code:`data` attribute. First, we create a DataFrame using
pandas:

.. code-block:: python

    my_dictionary = {'Column A': [1, 2, 3], 'Column B': [4, 5, 6]}
    my_dataframe = pandas.DataFrame(my_dictionary)

Then, we can assign this dataframe to our Table:

.. code-block:: python

    # Assign using the data argument
    my_table = cq.Table(data=my_dataframe)
    # Reassign by setting the data attribute
    my_table.data = my_dataframe

As with Values, we can also get or delete data from Tables:

.. code-block:: python

    # Get the current data
    extracted_data = my_table.data
    # Delete the current data
    del my_table.data

Something else we can do with Tables is read our data directly from .csv files:

.. code-block:: python

    # Create a new table
    my_table = cq.Table()
    # Import the data from a .csv file
    my_table.import_csv_data('my_data.csv')

Back to our Example
-------------------------------

Let's return now to our scenario outlines in the previous page. Imagine we are given a sample of bergamot oil to analyze
on our GC-MS system. After injecting the sample in the instrument, acquiring data, and performing best hits analysis by
comparing collected mass spectra to library spectra, we have the following table of MS results:

=========== ==================== ======================= ========
Peak Number Retention Time (min) Compound                Formula
=========== ==================== ======================= ========
1           18.23                tricyclene              C10H16
2           18.81                α-thujene               C10H16
3           19.26                α-pinene                C10H16
4           20.12                camphene                C10H16
5           21.75                sabinene                C10H16
6           21.89                ß-pinene                C10H16
7           22.44                6-methyl-5-hepten-2-one C8H14O
8           22.61                myrcene                 C10H16
9           23.26                octanal                 C8H16O
10          23.31                α-phellandrene          C10H16
11          24.01                δ-3-carene              C10H16
12          24.21                α-terpinene             C10H16
13          24.89                p-cymene                C10H14
14          25.06                limonene                C10H16
15          25.13                1,8-cineole             C10H18O
16          25.36                (Z)-ß-ocimene           C10H16
17          26.00                (E)-ß-ocimene           C10H16
18          26.81                γ-terpinene             C10H16
19          27.09                cis-sabinene hydrate    C10H18O
20          27.78                octanol                 C8H18O
21          28.43                terpinolene             C10H16
22          29.11                linalool                C10H18O
23          29.34                nonanal                 C9H18O
24          29.72                heptyl acetate          C9H18O2
25          29.98                cis-limonene oxide      C10H16O
26          38.53                linalyl acetate         C12H20O2
27          54.49                ß-bisabolene            C15H24
=========== ==================== ======================= ========

We also get the following table of FID integration results:

=========== ==================== ===========
Peak Number Retention Time (min) Area
=========== ==================== ===========
1           18.24                9.642202419
2           18.79                721.0011595
3           19.30                2626.975549
4           20.10                80.28715918
5           20.36                13.20155492
6           21.71                2599.987859
7           21.90                16464.64891
8           22.48                9.540131909
9           22.61                2383.918994
10          23.22                67.72271269
11          23.27                84.84936771
12          24.04                11.54458953
13          24.23                381.0284691
14          24.88                954.0299001
15          25.02                79746.12728
16          25.10                32.72194488
17          25.33                92.3024417
18          25.48                8.178220039
19          25.97                476.8954081
20          26.85                17449.17202
21          27.11                94.59826749
22          27.76                20.20661954
23          28.41                746.6239539
24          29.15                23250.14097
25          29.39                79.28369498
26          29.71                7.280006075
27          29.83                3.327113495
28          30.01                22.03782356
29          38.51                66953.91378
30          54.47                1183.234388
=========== ==================== ===========

Let's start with the integration table. We can create a Table instance in Python by first adding our data to a DataFrame:

.. code-block:: python

    # Define a dictionary with our data as tabulated above
    fid_dictionary = {'Peak Number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                                     16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
                     'Retention Time': [18.24, 18.79, 19.30, 20.10, 20.36, 21.71, 21.90, 22.48, 
                                        22.61, 23.22, 23.27, 24.04, 24.23, 24.88, 25.02, 25.10,
                                        25.33, 25.48, 25.97, 26.85, 27.11, 27.76, 28.41, 29.15,
                                        29.39, 29.71, 29.83, 30.01, 38.51, 54.47],
                     'Area': [9.642202419, 721.0011595, 2626.975549, 80.28715918, 13.20155492, 2599.987859
                              16464.64891, 9.540131909, 2383.918994, 67.72271269, 84.84936771, 11.54458953,
                              381.0284691, 954.0299001, 79746.12728, 32.72194488, 92.3024417, 8.178220039,
                              476.8954081, 17449.17202, 94.59826749, 20.20661954, 746.6239539, 23250.14097
                              79.28369498, 7.280006075, 3.327113495, 22.03782356, 66953.91378, 1183.234388]}

    # Create a DataFrame from the dictionary
    fid_dataframe = pandas.DataFrame(my_dictionary)

    # Create a Table with the data set to the DataFrame
    fid_table = cq.Table(data=fid_dataframe)

Alternatively, we could read the data directly from a .csv file if it is in that format. Let's do that with our MS data:

.. code-block:: python

    # Create a new table
    ms_table = cq.Table()
    # Import the data from a .csv file
    ms_table.import_csv_data('ms_data.csv')

Adding Complexity
-------------------------------

This is just the tip of the iceberg for these classes. There are several additional layers of functionality---including
formula assignment and reporting---that will be expanded upon further in the Getting Started section. Please continue on
to see how you can integrate this simple data manipulation in a more complex analysis.