UltraScan Version

Manual


SOMO HPLC-SAXS Module:

Last updated: March 2016

NOTICE: this module is being developed by E. Brookes, J. Pérez, P. Vachette, and M. Rocco.
Portions of this help file are taken from the Supplementary Materials of Brookes et al., "Fibrinogen species as resolved by HPLC-SAXS data processing within the UltraScan SOlution MOdeler (US-SOMO) enhanced SAS module", J. Appl. Cryst. 46:1823-1833 (2013), and of Brookes et al. "US-SOMO HPLC-SAXS Module: Dealing with Capillary Fouling, and Extraction of Pure Component Patterns from Poorly Resolved SEC-SAXS Data", under revison at J. Appl. Cryst., 2016.

SOMO HPLC-SAXS Lysozyme I(q) data

This US-SOMO module was conceived for the analysis of HPLC-SAXS data. In the image above, the main panel of the HPLC-SAXS module is shown. The buttons with the black labels are the ones currently active, the ones with the red labels become active when allowed by the processing/visualization stage. The graphics panel shows a collection of HPLC-SAXS log10[I(q)] vs. q SAXS data frames (points with 0 or negative values are automatically omitted from the visualization only) for a chicken egg-white lysozyme chromatographic separation on a Agilent BioSec-3 (3 μm particle size, 300 Å pore-size) 4.6 × 300 mm column, eluted with Hepes 50 mM, NaCl 100 mM, pH 7. Note the permanent upturn at very small q-values, due to biological material aggregated by the intense X-ray beam on the capillary cell walls under these far from optimal experimental conditions. While this kind of problem should be (and has been) preferentially dealt with at the experimental level, we use this dataset to demonstrate the potential for correcting data still presenting such an issue.

The left side of the window is divided in three sections, labeled "Data files", "Produced Data", and "Messages". By clicking on these labels, the corresponding panel below each label will disappear, allowing for an expansion of the remaining other panel(s). If every panel is made to disappear, the main graph will expand to cover the full size of the HPLC-SAXS window. By clicking again of the labels, the corresponding panels will be restored.

On the top left panel (Data files) there are four buttons:

The Add files button is used to load data into the module. An operating directory can be pre-selected by clicking on the path shown above it, and navigating in the file system (selecting the Lock checkbox will fix that directory). The file format for SAXS data recognized by the US-SOMO HPLC-SAXS module consist of .dat files with two or three TAB- or space-separated columns containing the q, I(q), and optionally their associated standard deviation (SD) values, respectively. Each frame number (or time value) must be present somewhere in the filename with a common prefix and suffix. For example, data1saxs.dat, data2saxs.dat, data3saxs.dat will be recognized as frames 1,2,3, where "data" and "saxs" can be replaced by any common sequence of characters. Consequently, 1.dat, 2.dat, 3.dat would be acceptable, but abc1.dat, qrs2.dat, xyz3.dat would not, because the prefix characters are not common. Furthermore, the loader will also arrange the data files sequentially, in increasing frame number (or time value) order. Concentration-related data should be instead uploaded using the Concentration load button (see below). I(q) vs. q and concentration data frames are automatically recognized and the labels on the x- and y-axes are then properly set.

Similar will select files with similar names and allow manual pattern matching entry if no new similar files are selected.

Concentration will show every file listed together with their associated concentration (mg/ml), if appropriate and properly set (see below). Concentrations can also be entered and modified manually. They can be used to normalize the I(q) vs. q data (see below). Loaded files can be displayed on the graphics panel by individually clicking on them (shift-click will select a contiguous series, ctrl-click allows multiple irregularly spaced selections). Produced data will also show up in this panel with associated putative filenames.

Remove files will discard previously selected files (see below); if the files were produced by the module, and were not previously saved, a warning window will pop-up, allowing to proceed or to stop removing the selected items.

Several buttons are available in the panel below the loaded files window:

SOMO HPLC-SAXS files commands panel

Sel. all will select all files.

Sel. Unsel. will allow toggling the selection between selected files and everything else not currently selected.

Adv. Sel. will open up a panel in which several selection options can be utilized (see here).

View, active when up to ten datasets are selected, will show them in text format.

Movie: Pressing this button will open a pop-up window with the commands allowing to view in the main graphics window of the US-SOMO HPLC-SAXS module a series of selected data files in a movie-like manner, and to optionally save each frame as an image for real movie-making operations (see here).

The Log X (Lin X) and Log Y (Lin Y) buttons allow to toggle between linear and log10 scaling of the data on the x- and y-axes, respectively (if zero or negative values are present, they will be temporarily removed when the scale is set to log10 mode, as they cannot be shown on the display in this mode). The buttons will change their respective label once pressed, to underscore what is the action currently available.

Selecting the Err checkbox, active when up to 10 files are selected, will switch their representation from the dots connected with a line mode to symbols (diamonds) with their associated SDs represented as error bars mode.

Rescale adjusts the X-Y axes on the graphics window to maximize the display of selected datasets (no effect on the data themselves).

Normalize will divide the I(q) data by the stored/entered concentrations.

Average will produce a weighted average with propagated SDs of selected data. The resulting datset filename will contain the number of frames averaged, and the initial and final frame numbers, followed by "_avg".

To SOMO/SAS will transfer selected datasets back into the US-SOMO SAS panel.

Each time the Width button is pressed, it increments the data line (or symbol) size of the plots, until it goes back to the initial value.

Color shifts the colors used in the graphics window; the operation can be repeated until a better contrast with the background is achieved. Note that the background color can be changed by right-clicking on the plot borders, which will open up a pop-up dialogue panel where all plot characteristics can be modified.

Bin allows averaging adjacent points in I(q) datasets, starting with the first point in the file and using a binning size defined in a pop-up dialogue:

SOMO HPLC-SAXS binning window panel

Smooth performs a regularization of selected data using a moving window, whose dimension is defined in a pop-up menu (shown below), using a Gaussian smoothing kernel of 2n+1 points.

SOMO HPLC-SAXS smoothing window panel

SVD opens a pop-up window were a single-value decomposition analysis (e.g., Williamson et al., Biophys J. 94, 4906-4923, 2008) can be performed on the selected data (see here). Important: the data must be all on the same grid; if not, a warning message will appear in the bottom left Messages window: "SVD: curves must be on the same grid, try 'Crop Common' first" (see below for the use of the Crop Common button).

Make I(t) is one of the crucial operations in the HPLC-SAXS module. It allows to generate a series of "chromatograms" (I(t) vs. t, where t can be real elution time or frame number) for each q-value present in the original data files (see below). A test could be automatically performed each time an I(q) vs. q dataset is converted into an I(t) vs. t dataset to ascertain if any I(t) vs. t "chromatogram" contain useful data, on the basis of a comparison between the signal and its associated SDs, by selecting its relative checkbox and the SD factor in the Options menu accessible from the button provided at the bottom of this window (see here).

Test I(t) produces a temporary regeneration of the I(q) vs. q frames allowing to test the results of data treatment (like baseline correction and/or Gaussian decomposition) using a series of tools for scaling and Guinier analysis, the latter producing Rg and I(0) values (see here ).

Make I(q) is the other crucial operation in the HPLC-SAXS module. It allows to re-generate I(q) vs. q files for each frame after data treatment in frame- (or time-) space.

Concentration load is used to upload any chromatographic data files containing a concentration-related elution profile, such as those produced by UV-VIS absorption or refractive index detectors (the program will then internally keep track of such datasets, distinguishing them from SAXS datasets). By default, the program will look for "*.txt" files, but the choice could be expanded to other extensions in the file upload dialogue. The currently recognized format for concentration data is similar to the SAXS data format with the addition of the string "Frame data" in any place on the first line. The two or three columns of data are the frame number, concentration-related data, and optionally an associated SD value.

Repeak is used to effectively scale data (usually a concentration-related chromatogram) on the y-axis to a pre-set target (usually a low-q, high-intensity I(t vs. t chromatogram), selectable in a pop-up window among the data subjected to this operation (this affects the data, a new file is generated with "rp" and the scaling factor added at the end of the filename). See more below on this subject.

Set will set an already uploaded and currently selected file containing the UV or refractive index profile vs. time or frame number as the source of the concentration-dependent signal.

Detector will allow to select the type of detector and to enter its calibration constant in a pop-up window (see here).


SOMO HPLC-SAXS Lysozyme I(t)

Since a typical HPLC-SAXS experiment produces a series of I(q) vs. q data collected at some time interval ("frames"), they can be inserted in a 2D matrix where each line corresponds to a frame number (or time value) and the columns contain the intensities I(q) and their associated SDs at the various scattering angles q. It is then a simple operation to generate another matrix where the lines correspond to the q-values and each column contains the intensities I(t) (and their associated SDs) corresponding to each frame number (or time value). A new data set consisting of I(t) vs. t "chromatograms" for each q-value can then be generated.
In the image above, the original I(q) vs. q data shown in the first image of this Help section have been transformed to I(t) vs. t data by pressing the Make I(t) button after selecting all files. The I(t) vs. t data are automatically displayed after the conversion, and the q values are now part of the resulting filenames. Since the On Make I(t), discard I(t) with no signal above st. dev. multiplied by "2.5" was selected in the Options menu (see here), the following Warning message appeared:

SOMO HPLC-SAXS discard I(t) files warning

In addition, a test is automatically performed to identify regions within a sliding window (of 25 frames in this case) where the sum of the intensity is less than the negative of the sum of the corresponding SD values over the window. Regions with negative values will cause problems with the integral baseline subtraction procedure (see more below). This test identified just a single I(t) vs. t chromatogram failing it, as shown in a pop-up window:

SOMO HPLC-SAXS negative I(t) integral warning

Some cropping operations (see below) can be also performed to remove very noisy low-q datasets, such as the first three q values displayed in the Figure above (magenta, olive and greenblue) and/or to truncate the datasets if necessary. All operations are recorded in the bottom left panel.

The file names of produced data are shown in the Produced Data panel to the centre-left, and can be selected and saved to files using the appropriate buttons below it.

Select all will select all files in this panel.

Invert will allow toggling the selection between selected files and everything else not currently selected.

Similar will search for similar file names after selecting a single file in this panel.

Remove will discard the selected files.

Two types of files can be produced, csv-style (Save CSV) or regular 3-columns .dat files (Save).

Show will add the selected file(s) among those produced to the ones already displayed in the graphics window.

Show only will show only the selected file(s) among those produced in the graphics window.

In the Messages area, the operations performed are tracked, and computed parameters are shown. The display can be copied or cleared from the File pull-down menu.

The last line of the left-side panels contains the Help and Options buttons. On pressing the latter, a pop-up panel will be shown:

SOMO HPLC-SAXS options

See here for a description of this module.


Below the US-SOMO HPLC-SAXS module graphics panel there are a series of buttons for performing several operations on the files displayed, some of which will become available only when multiple files are selected, or a region of the graph is zoomed, while others will become available only when single files are selected:

SOMO HPLC-SAXS commands

When a part of the graph is selected using the mouse/left button, the buttons in the bottom line become all available (only Crop Zeros and Crop Common are available when files are just displayed after selection).

Of the top-row commands, two deserve already a comment at this point:


Visual inspection of the I(t) vs. t chromatograms can already hint at problems, such as in the lysozyme data here used as an example. In this case, it is evident that many I(t) vs. t chromatograms starting from the low-q region do not return to pre-peak elution baseline intensity values. Most likely, this is due to capillary fouling, and without proper correction these data would be mostly useless. For this, and for less evident cases, an Integral Baseline correction procedure has been devised
The Integral Baseline method is based upon the assumption that capillary fouling deposits are formed in proportion to the sample concentration while exposed to the beam, and that neither the buffer nor the instrumental background are contributing to this effect. That deposition on the capillary does occur is clearly proven by the fact that a steady SAXS signal is maintained even after completion of the protein elution. The theory underlying the Integral Baseline correction procedure can be found here.

To help the user decide if a baseline correction is needed, and to find a proper region of SAXS steady state signal at the end of the chromatograms, the currently implemented Integral Baseline method requires an analysis on blank frames. These "Blanks" (no less than 10 frames, possibly at least 20 or more must be available) should have been collected well before the void volume, and should preferentially be the same ones that were then averaged and subtracted from all the data collected during the chromatogram development.

SOMO HPLC-SAXS Blanks loaded

After Blanks files have been loaded using the Add files button (see above), their analysis is launched by pressing the Blanks analysis button. The module will automatically convert the I(q) vs. q frames into I(t) vs. t chromatograms:

SOMO HPLC-SAXS Blanks I(t)

The two vertical magenta lines and their corresponding fields at the bottom of the buttons' zone define the beginning and end regions for the Blanks analysis. By clicking on one of the fields and then moving the mouse on the grey-scale bar-wheel just below the graphics window, these limits can be changed. This can also be done in steps of a single frame by clicking on the "<" and ">" buttons placed at the extremities of the bar-wheel. Alternatively, the limits can be manually changed by entering a numerical value in their respective fields.

The Blanks analysis is performed by clicking on the CorMap analysis button. This will launch a pairwise Correlation Map analysis (see here for a descrption of the CorMap implementation in the US-SOMO HPLC-SAXS module). Before the analysis is effectively launched, a pop-up panel will appear:

SOMO HPLC-SAXS CorMap sampling pop-up

It was found during the implementation of the Blanks analysis that finely spaced q values might result in cross-correlation effects in the CorMap analysis (see also here). Therefore, this pop-up panel will allow to chose a sampling in q space to eliminate or at least alleviate this problem. Since usually a one-every-two values sampling is sufficent, this can be directly done by pressing the Sample alternate q points button. Larger sampling intervals can be chosen by entering an integer value after pressing the Specify a larger gap in q points button. If no sampling is wanted, the Continue button should be pressed.
A second pop-up option will also allow to start the CorMap analysis above a chosen qmin value, to avoid including very noisy, low-q values in the analysis:

SOMO HPLC-SAXS CorMap minimun q value pop-up

After these choices are made, the analysis is effectively launched, and the results are shown in a new pop-up panel (see here for a full description of the CorMap implementation):

SOMO HPLC-SAXS CorMap of Blanks with alternate points

The pop-up panel begings by reporting on the top bar the type of analysis (here "Blanks mode t 1 - 89"), the max q limit used (here 0.05 Å-1), and the sampling used (here "Only every 2nd q value selected").

Three plots are present on the top of the panel:

The text area reports first the the pairwise P-map color definition, and then where to look for the correspondence between the axis ticks and the actual data. Then follows a summary of the most relevant data:

Below these summaries, the first list reports the correspondence between the "Reference numbers" (Ref) assigned to each dataset (here frames) and its "real" name. This was introduced to avoid having to deal with complicated names in the axes legends on the plots (in this case, the frame numbers that were extracted from the I(t) vs. t filenames start at "1", so they are equal to the "Ref" numbers). The list also reports for each frame the Avg. P value, the Min. P value, and the % Red points.
At the end of the first list, a second list reports all the pairwise comparisons results, including the number of points compared (N), followed by the q point position where the longest streak occurs (Start point), then the length of the longest streak (C), and finally the P-value of a streak of length C occurring in a sequence of N points, as shown in the image below:

SOMO HPLC-SAXS CorMap Blanks list

All data listed in the CorMap analyis pop-up window can be saved in a csv-type file with the Save button. Previously analyzed datasets can be recalled with the Load button.

After closing the CorMap analysis window, the Blanks data can be accepted by pressing the Keep button. Cancel will instead discard the current CorMap analysis.

The Integral Baseline analysis of the actual sample frames can then begin. Contrary to what was required in our previously developed Integral Baseline method, the current version requires that all I(t) vs. t chromatograms must be selected before hitting the Baseline button.
In any case, on pressing Baseline a pop-up warning message will always appear:

SOMO HPLC-SAXS Int Bas Blanks Warning

alerting that a blanks analysis is needed to proceed any further, and offering up to three options:

and, if Blanks were analyzed during the current session,

Another pop-up will then appear, reminding that the first step in the Integral Baseline procedure is to find a final region of constant intensity:

SOMO HPLC-SAXS integral baseline prescription

After pressing OK, the graphics window will present all the selected I(t) vs. t chromatograms and switch to the Baseline mode of analysis:

SOMO HPLC-SAXS integral baseline setting 1

As shown in the image above, this superimposes to the selected chromatograms three vertical lines on the right side, the last two lines of buttons under the graphics window are replaced by three colored fields (magenta-red-magenta), and a dashed line is drawn horizontally (orange). In adfdition, a Fix window width checkbox with its associated magenta-colored field in now present (default: 20 frames, unchecked), as well as a new Find best region button.
The first vertical magenta line, which by default is positioned at 75% of the available frames, has multiple usages:

The second vertical magenta line defines the end of the sliding window (default: 20 frames beyond the first magenta line).

The vertical red line defines the end for the sliding window analysis (default position: 5 frames from the end of the available frames).

The horizontal orange line represents the average intensity across the current window of the lowest q-value among the selected I(t) vs. t chromatograms.

It is important to remind that the baseline is set to be at zero at the beginning of the data on the left side.

The positions of the three vertical lines are indicated in the three background color-coded fields. By cliking on one of the fields, the corresponding vertical line position can be changed using either the grey-shades bar-wheel, or the "<" and ">" buttons at its sides. Manual values can be also entered.

If the Fix window width checkbox is not selected, moving either of the two vertical magenta lines will also change the width of the sliding window.

It is then best to first define a window width by moving either one of the vertical magenta lines, and then fix it by selecting the Fix window width checkbox. At this point, the entire window can by positioned by using either of the two vertical magenta lines. It is suggested to position it in a region where there is still some visible intensity decay, as shown below:

SOMO HPLC-SAXS integral baseline setting 2

The baseline analysis is then completed by pressing the Find best region button.

This will launch a special CorMap analysis in which first a global CorMap calculation will be carried out between the entire range of frames from the first vertical magenta line to the vertical red line. Subsets of this CorMap analysis corresponding to the sliding window regions will then be extracted and compared with the average of all possible CorMap analysis results extracted from the pre-analyzed Blanks data for a sliding window of the same size.

In addition, the analysis will calculate the integrated average intensity at each frame of all the I(q) values from the minimum q-value selected up to the qmax defined in the Options panel (default: 0.05 Å-1).

The results will appear in two pop-up panels. The first one is analogous to the one appearing after the Blanks analysis:

SOMO HPLC-SAXS integral baseline Sample CorMap

Here, it can be appreciated the almost completely red left and top sides of the Pairwise P value map plot, originating from the fact that regions on the descending side of the elution peak were included in the analysis. This also heavily affects the right-side Red cluster size histogram, with an almost invisible huge size (≈5000) but extremely low count cluster greatly compressing the scale. If we zoom on the low red cluster size region, this is what becomes visible:

SOMO HPLC-SAXS integral baseline Sample CorMap zoom

But most relevant is the second pop-up panel that will appear on top of the first:

SOMO HPLC-SAXS integral baseline Sample CorMap analysis

The graph in this panel is composed of two plots, both as a function of the starting window position. The bottom histogram (left-side y-axis scale) reports the average red cluster size for each window in the sliding window ensemble. The horizontal cyan solid line defines the Blanks average red cluster size for all possible windows of the same size as the sliding window utilized for the Sample analysis (the dotted line represents + 1 SD). The bars in the histograms are colored red when they are above the Blanks + 1 SD value, while green and white when they are ≤ the Blanks + 1 SD value, with the white being the lowest value(s) (equal values are possible).

The top plot (right-side y-axis scale) reports the averaged I(q) for q ≤ the qmax value (0.05 Å-1 by default, as set in the Options), as the solid orange line, with the dotted orange lines representing ±1 SD. The solid magenta line defines the zero value expected for blanks-subtracted data when only buffer is present.

The goal of this combined analysis is twofold:

The first message appearing in the text region concerns the second of the points listed above. If the average integrated I(q) ±1 SD vs. starting frame position plot is always above the zero reference line and the average red cluster size is less than the Blanks' average red cluster size +1 SD, the message Integral baseline correction is possible appears.

Below the summary sentence, a first report of the baseline analysis is printed. It contains:

A second block of information is then printed further down, containing Blanks-related information: