Using the VHlab database, aka the 'experiment' file

As described here, the VH lab database for a particular experiment resides in the [DIRNAME]/analysis/experiment file. This file is in Matlab format, although it doesn’t contain the .mat extension (see notes on Matlab files that lack the .mat extension).

Using the dirstruct interface can make reading and writing from the database easier

One can create a new dirstruct object using the following:

ds = dirstruct([DIRNAME]);

An easy way to get the full filename for the experiment file is

expfilename = getexperimentfile(ds)

% returns '[DIRNAME]/analysis/experiment'

% you can use getexperimentfile(ds,1) to create the experiment file if it doesn’t exist

Reading the entire contents of the experiment file:

One can read all of the variables in the experiment file using the following standard Matlab command:

mydata = load(expfilename,'-mat');

What’s usually in the experiment file

The experiment file can contain anything in principle, but typically contains 2 types of variables:

    • a variable called name, which is a string equal to the name of the experiment

    • several variables for cells, named like cell_NAME_REF_YYY_20XX_XX_XX, where X’s indicate the date, and NAME and REF are the name and reference pair of the electrode from which the cell was recorded, and YYY is the “cluster” number of the cell on the electrode (some electrodes can record multiple cells at the same time, and these different numbers correspond to the different records; for electrodes that can only record single records, then this number is always 001).

    • The variable YYY, the "cluster number", depends on the acquisition system used to acquire the data:

      • For neurons acquired via Spike2, the numbers are 0-49.

      • For neurons acquired via LabView and spike sorted with our own software, the numbers are 50-100.

      • For neurons acquired in Spike2 and spike-sorted in Plexon, the numbers are 200 - 250.

      • For neurons acquired via LabView and spike-sorted in Plexon, the numbers are 400 and higher.

      • For neurons acquired with 2-photon imaging, this is just the ID number of the cell in our analyzetpstack software (http://github/VH-Lab/vhlab-TwoPhoton-matlab).

There are no true restrictions on what can be placed in variables in the experiment file, but as a rule of thumb, data that is particularly large or cumbersome might not be a good choice for inclusion. For example, for a cell that is extracellularly recorded, we typically store all spike times in the corresponding record in the database, but we do not store all the raw data. Likewise, for 2-photon data, we store the raw fluorescence brightness values for the region-of-interest that includes a given cell, but we do not store all the raw image data for the cell in the database. Instead, for both of these cells, we would store the directory information that would allow a program to access the appropriate data, if necessary.

A useful tool for reading a list of cells

The most common element that is read from the database is a list of cells. It is often convienient to read in a list of cell names and the associated cell data all at once, so I wrote a function that does this:

[celldata,cellnames] = load2celllist(getexperimentfile(ds),'cell*','-mat');

The ‘cell*’ string tells the function to read only variable names within the experiment file that start with ‘cell’; this command will read in all of the cells in the experiment into the variables celldata and cellnames.

Note that the ‘cell’ in load2celllist refers to the Matlab data type ‘cell’; that is, the non-matrix data type that can assume any size, and it is only a coincidence that it is commonly used to read in data that describes cells in the nervous system.

The measureddata object format and its “children”

If you type celldata{1}, you will notice that cells are of one of the following data types: spikedata, or, most likely, cksmultiunit. These data types are all based on a data type that SDV built called measureddata. The type measureddata provides some functionality for programmers working with data that has been acquired. There are 3 important services that are provided by measureddata:

  1. Time interval management: keeps track of when there was data recorded for this record (and at the same time keeps track of when there is no data for this record). The functions that users can call to obtain or set this information are get_intervals and set_intervals, respectively. Of these functions, only get_intervals is typically called by the end user; set_interval is normally called when creating a measureddata object.

  2. The ability to read the data that is available for this record between 2 time points, T0 and T1, using the function get_data. The type of data that is returned depends upon which “child” data type object is being employed; spikedata and cksmultiunit return lists of all spike times in the interval [T0…T1], such as in myspikes = get_data(myspikedataobject,[T0 T1]); If data were not continuously acquired within the range [T0…T1], then an error is returned because there will be times in this interval when we do not know if there was a spike or not. Sometimes it is helpful to override this “feature” of interval management and return all known spikes in an interval [T0..T1], regardless of whether or not the acquisition was continuous in this interval, and this can be done by passing an additional parameter: myspikes = get_data(myspikedataobject,[T0 T1],2);

  3. Arbitrary data can be associated with a measureddata object using the associate function. This is the feature that makes measureddata objects a database. Each associate is a structure with 4 fields:

    1. type: a string indicating the type

    2. owner: a string indicating the program that created it

    3. data: the data for the associate

    4. desc: a description, to help the user remember what is in it

These can be read using the A = findassociate(mymd,'','','') function; if the arguments are blank, as in this example, all associates are returned, but one can search for associates by field using A = findassociate(mymd,[type],[owner],[desc]); if any of those arguments are left empty (''), then they will not be used in the search (that is, it specifies “any type”, or “any owner”, or “any desc”).

Examples of reading from associates and using the information to perform additional analysis

text

Adding associates to a measureddata object (example)

One can add a new associate to a measureddata object using the associate command. For example, we can make a structure object for our associate:

assoc.type = 'My new type';

assoc.owner = 'Me';

assoc.data = [1 2 3 4 5];

assoc.desc = 'A simple vector of example data';

mymd = associate(mymd, assoc);

% the associate assoc is added to the measureddata type mymd

Equivalently, one may use the long form without first making a structure object:

mymd = associate(mymd,'My new type','Me',[1 2 3 4 5],'A simple vector of example data');

If the associate type already exists, then the new associate will replace the old.

Removing associates from a measureddata object

One removes associates from a measureddata object using the disassociate command. First, you must know the index of all the associates you wish to remove. For example, to find the index values for all associates, use:

[A,I] = findassociate(mymd,'','','');

mymd = disassociate(mymd,I);

To remove a specific associate, you can search for it:

[A,I] = findassociate(mymd,'My type','','');

If ~isempty(I), mymd = disassociate(mymd,I); end;

% we check to see if I is empty first

% it’s possible that this cell doesn’t have an entry called 'My type' and we don’t want to remove something random.

For your interest: how our online analysis/data import programs first create measureddata objects

[Future text here]

Performing an operation on all of the cells in the database:

The following code snippet calls the function performfunction (a made-up function) for all of the cells that were loaded into the database above (using the load2celllist example):

for i=1:length(celldata),

newcelldata{i} = performfunction(celldata{i},cellnames{i});

end;

This code could potentially add associates to each cell, or modify the existing associates. You could have just used the same variable celldata{i} instead of making a new cell list newcelldata, but I sometimes prefer to make a new list in case an error occurs partway through the loop. This helps to clearly differentiate data that was recently read from disk from any potentially modified data that only exists in memory.

Writing or rewriting variables to the experiment file

If you have new or modified variables to write back to the experiment file, you can use the function

saveexpvar(ds, vardata, varnames)

to save a cell list of variables vardata with the names varnames to the experiment file associated with the experiment ds.

In the example above, where we ran a function on all cells and generated new associates or modified associates, we might use the following code to re-save the data to the experiment file:

saveexpvar(ds, newcelldata, cellnames)

Note that the function saveexpvars creates an “experiment-lock” file in the same directory as “experiment” so that other users cannot save data to the same file at the same time (this results in data corruption). However, if an error occurs during your save, you may need to manually remove the lock file (see help saveexpvars).