In this chapter, the user will be walked trough an example of how new polymer chemistry definition data can be generated and included in the automatic “data detection system” of massXpert (that is how new polymer chemistry definitions should be registered with the system).
Customization is typically performed by the normal user (not the
Administrator nor the Root of the machine) and as such new data are
typically stored in the user's “home” directory. On
UNIX machines, the “home” directory
is usually the /home/username
directory, where username is the login username. On
MS-Windows, that directory is typically the
C:/Documents and Settings/username
once again with username being the login username.
Although MS-Windows pathnames use a back slash (“\”, in this book these are composed using forward slashes for a number of valid reasons. The reader only needs to replace back slashes with the forward variety.
In the next sections we will refer to that “home directory” (be it on UNIX or MS-Windows machines) as the $HOME directory, as this the standard environment variable describing that directory in GNU/Linux.
When massXpert is executed, it automatically tries to read data
configuration files from the home directory (in the .massXpert2
directory). Once this is done, it
reads all the data configuration files in the installation directory
(typically, on GNU/Linux that would be the
configuration data in the /usr/share/massxpert2/data
directory or, on
MS-Windows, the c:/Program Files/massXpert2/data
directory).
We said above that massXpert2 tries to read the data configuration files from the home directory. But upon its very first execution, right after installation, that directory does not exist, and in fact massXpert2 creates that directory for us to populate it some day with interesting new data.
The $HOME/massxpert2
directory
should have a structure mimicking the one that was created upon installation
of the software, that is, it should contain the following two directories:
polChemDefs
polSeqs
Those are the directories where the user is invited to store their
personal data. In order to start a new definition, one might simply copy in
the polChemDefs
one of the polymer
chemistry definitions that are shipped with massXpert. What should be copied?
An entire polymer chemistry definition directory, like for example the
following:
/usr/local/share/massxpert2/polChemDefs/protein-1-letter
or
C:/Program
Files/msXpertSuite/data/massxpert2/polChemDefs/protein-1-letter
Once that polymer chemistry definition is copied, one may start studying how it actually works. This directory contains the following kinds of files:
protein-1-letter.xml
: the polymer chemistry
definition file. This is the file that is read upon selection of the
corresponding polymer chemistry definition name in XpertDef. If the polymer
chemistry definition is not yet registered with the system (described
later), then open that file by browsing to it by clicking
Cancel (see Chapter 3, XpertDef: Definition of Polymer Chemistries);
SVG files: scalar vector
graphics files used to render graphically the sequence in
the sequence editor. For example, arginine.svg
contains the graphical representation of the arginine monomer. There are
such graphics files also for the modifications (like, for example, the
sulphation.svg
contains the graphical
representation of the sulphation modification. Figure 8.1, “The polymer chemistry definition directory” shows
two examples of SVG files belonging to two distinct
polymer chemistry definitions;
chemPad.conf
: configuration file for the chemical pad in
the XpertCalc module;
monomer_dictionary
: file establishing the relationship
between any monomer code of the polymer chemistry definition and the graphical
SVG file to be used to render graphically that monomer in the
sequence editor;
modification_dictionary
: file establishing the
relationship between any monomer modification (see Section 4.8.1, “Selected Monomer(s) Modification”) and the graphical
SVG file to be used to render graphically that
modification onto the modified monomer in the sequence editor;
cross_linker_dictionary
: file establishing the
relationship between any cross-link (see Section 4.9, “Monomer Cross-linking”) and the graphical
SVG file to be used to render graphically that
cross-link onto the cross-linked monomers in the sequence editor;
pka_ph_pi.xml
: file describing the acido-basic data
(see Section 4.16, “pKa, pH, pI and Charges”) pertaining to
ionizable chemical groups in the different entities of the polymer
chemistry definition;
The polymer sequence editor is not a classical editor. There is no font in this editor: when the user starts keying-in a polymer sequence in the editor, the small SVG graphics files are rendered into raster vignettes at both the proper resolution and screen size and displayed in the sequence editor. The user is totally in charge of designing the SVG graphics files for each of the monomers defined in the polymer sequence editor. Of course, reusing material is perfectly possible.
One powerful software to edit SVG files is Inkscape, on any platform.
There is one constraint: that the
monomer_dictionary
file lists with precision
“what monomer code goes with what SVG graphics
file”. That file has the following contents, for example, for the
“protein-1-letter” polymer chemistry definition, as shipped
in the massXpert package:
# This file is part of the massXpert2 project.
# The "massXpert2" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.
# Copyright (C) 2006,2023 Filippo Rusconi
# This is the monomer_dictionary file where the correspondences
# between the codes of each monomer and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made.
# The format of the file is like this :
# -------------------------------------
# A%alanine.svg
# where A is the monomer code and alanine.svg is a
# resolution-independent svg file.
# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file.
# This file is case-sensitive.
A%alanine.svg
C%cysteine.svg
D%aspartate.svg
E%glutamate.svg
F%phenylalanine.svg
G%glycine.svg
H%histidine.svg
I%isoleucine.svg
K%lysine.svg
L%leucine.svg
M%methionine.svg
N%asparagine.svg
P%proline.svg
Q%glutamine.svg
R%arginine.svg
S%serine.svg
T%threonine.svg
V%valine.svg
W%tryptophan.svg
Y%tyrosine.svg
What one sees from the contents of the file is that each monomer code
has an associated SVG file. For example, when the user
has to key-in a valine monomer, they key-in the code V and
XpertEdit knows that the monomer vignette to show has to be rendered using the
valine.svg
file.
For the monomer modification graphical rendering, the situation is
somewhat different, as seen in the
modification_dictionary
file:
# This file is part of the massXpert2 project.
# The "massXpert2" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.
# Copyright (C) 2006,2023 Filippo Rusconi
# This is the modification_dictionary file where the correspondences
# between the name of each modification and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made. Also, the graphical operation that is to
# be performed upon chemical modification of a monomer is listed ('T'
# for transparent and 'O' for opaque). See the manual for details.
# The format of the file is like this :
# -------------------------------------
# Phosphorylation%T%phospho.svg
# where Phosphorylation is the name of the modification. T indicates
# that the visual rendering of the modification is a transparent
# process (O indicates that the visual rendering of the modification
# is a full image replacement 'O' like opaque). phospho.svg is a
# resolution-independent svg file.
# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file.
# This file is case-sensitive.
Phosphorylation%T%phospho.svg
Sulphation%T%sulpho.svg
AmidationAsp%O%asparagine.svg
Acetylation%T%acetyl.svg
AmidationGlu%O%glutamine.svg
Oxidation%T%oxidation.svg
There are two ways to render a chemical modification of a monomer:
Opaque rendering: the initial monomer vignette is replaced using the one listed in the file for the modification. One example is the “AmidationGlu” modification:
AmidationGlu%O%glutamine.svg
Upon amidation of a glutamyl residue (“AmidationGlu”
is the name of a modification in the current polymer chemistry
definition), the graphical representation of the modification involves
the replacement of the glutamyl residue vignette
in the sequence editor with the new one, that happens to be in the
glutamine.svg
file. In other words, the process
involves an “Opaque” overlay of the
vignette for unmodified Glu with a vignette rendered by using the
glutamine.svg
file.
Transparent rendering: the initial monomer vignette is overlaid with a new vignette that is read from an SVG file that has a transparent background. One example is the “Phosphorylation” modification:
Phosphorylation%T%phospho.svg
The monomer undergoing a phosphorylation has its vignette overlaid
with a “Transparent” one showing a
small red “P” that is read from the
phospho.svg
file.
When designing vignettes, the best thing to do is to convert the text to path, so that the rendering is absolutely perfect.
It is absolutely essential, for the proper working of the sequence editor, that the SVG files be square (that is, width = height).
Once the new polymer chemistry has been correctly defined, it is time to register that new definition to the system. To recap: all the files for that definition should reside in a same directory, exactly the same way as the files pertaining to a given polymer chemistry definition are shipped in massXpert altogether in one directory. The name of the new polymer chemistry definition should be unambiguous, with respect to other registered polymer chemistry definitions.
The way personal polymer chemistry definitions are registered is by creating a personal polymer chemistry definition catalogue file, which must comply with both following requirements:
Be named xxxxx-polChemDefsCat
, with
“xxxxx” being a discretionary string (this might well be
your login username). The requirement is that
-polChemDefsCat
be the last
part of the filename.
Please DO NOT USE spaces or diacritical signs in your filenames. RESTRICT yourself to ASCII characters between [a-z], [0-9], “_” and “-”.
This is actually something very general as a recommendation in order to not suffer from severe headaches when you least expect them…
Be located in the $HOME/.massXpert2/data/polChemDefs
directory
and have the following format:
dna=/path/to/definition/directory/dna/dna.xml
In this example, the “dna” polymer chemistry
definition is being registered as a file dna.xml
located in the dna
directory,
itself located in the /path/to/definition/directory
directory;
Note that if a new polymer chemistry definition should be made available system-wide, then it is logical that its directory be placed along the ones shipped with massXpert and a new local catalogue file might be created to register the new polymer chemistry definition.
At this point the new polymer chemistry definition might be tested. Typically, that involves restarting the massXpert program and creating a brand new polymer sequence of the new definition type. The first step is to check if the new definition is successfully registered with the system, that is, it should show up a an available definition upon creation of the new polymer sequence. If not, then that means that the catalogue file could not be found or parsed correctly.
When problems like this one occurs, the first thing to do is to ensure that the console window (on MS-Windows it is systematically started along with the program; on GNU/Linux the way to have it is to start the program from the shell) so as to look with attention at the different messages that might help understanding what is failing.
Please, do not hesitate to submit bug reports (see Section 2, “Feedback from the users” for the method to contect the author for feature requests or bug reports).