Systems Biology Markup Language (SBML) Level 1:
Structures and Facilities for Basic Model Definitions


Michael Hucka, Andrew Finney, Herbert Sauro, Hamid Bolouri

{mhucka,afinney,hsauro,hbolouri}@cds.caltech.edu

Systems Biology Workbench Development Group
JST ERATO Kitano Symbiotic Systems Project
Control and Dynamical Systems, MC 107-81
California Institute of Technology, Pasadena, CA 91125, USA
http://www.cds.caltech.edu/erato

Principal Investigators: John Doyle and Hiroaki Kitano


SBML Level 1, Version 2 (Final)
28 August 2003


Contents


1 Introduction

We present the Systems Biology Markup Language (SBML) Level 1, Version 2, a description language for simulations in systems biology. SBML is oriented towards representing biochemical networks common in research on a number of topics, including cell signaling pathways, metabolic pathways, biochemical reactions, gene regulation, and many others. A recent conference (Kitano, 2001) highlights the range of topics that fall under the umbrella of systems biology and are in the domain of the description language defined here. Many contemporary research initiatives demonstrate the growing popularity of this kind of multidisciplinary work (e.g., Smaglik, 2000a; Abbott, 1999; Smaglik, 2000b; Popel and Winslow, 1998; Gilman, 2000).

SBML Level 1 is the result of merging modeling-language features from the following simulation systems: BioSpice (Arkin, 2001), DBSolve (Goryanin et al., 1999; Goryanin, 2001), E-Cell (Tomita et al., 2001,1999), Gepasi (Mendes, 1997,2001), Jarnac (Sauro, 2000; Sauro and Fell, 1991), StochSim (Morton-Firth and Bray, 1998; Bray et al., 2001), and Virtual Cell (Schaff et al., 2001,2000). SBML was developed with the help of the authors of these packages. As a result of being based on actual working simulation software, it is a practical and functional description language. Our goal in creating it has been to provide an open standard that will enable simulation software to exchange models, something that is currently impossible because there is no standard model exchange language. We expect SBML models to be encoded using XML, the eXtensible Markup Language (Bray et al., 1998; Bosak and Bray, 1999), and we include here an XML Schema that defines SBML Level 1.

1.1 Summary of Changes in Version 2 of SBML Level 1

This document describes Version 2 of SBML Level 1. Changes with respect to Version 1 of the SBML specification are indicated in red. Most changes in this document are simply textual changes made in an attempt to clarify the language of the specification and to correct typographical and other small errors. The following list is an overview of the more notable changes:

In addition, we have established the web site http://www.sbml.org as the home site for SBML, and all documents, schemas and software are available from there.

1.2 Scope and Limitations

SBML Level 1 is meant to support non-spatial biochemical models and the kinds of operations that are possible in existing analysis/simulation tools. A number of potentially desirable features have been intentionally omitted from the language definition. Future software tools will undoubtedly require the evolution of SBML; we expect that subsequent releases of SBML (termed levels) will add additional structures and facilities currently missing from Level 1, once the simulation community gains experience with the current language definition. In Section 6.1, we discuss extensions that will likely be included in SBML Level 2 or 3.

The definition of the model description language presented here does not specify how programs should communicate or read/write SBML. We assume that for a simulation program to communicate a model encoded in SBML, the program will have to translate its internal data structures to and from SBML, use a suitable transmission medium and protocol, etc., but these issues are outside of the scope of this document.

1.3 Notational Conventions

SBML is intended to be a common XML-based format for encoding systems biology models in a simple form that software tools can use as an exchange format. However, for easier communication to human readers, we define SBML using a graphical notation based upon UML, the Unified Modeling Language (Oestereich, 1999; Eriksson and Penker, 1998). This UML-based definition in turn is used to define an XML Schema (Fallside, 2000; Thompson et al., 2000; Biron and Malhotra, 2000) for SBML. There are three main advantages to using UML as a basis for defining SBML data structures. First, compared to using other notations or a programming language, the UML visual representations are generally easier to grasp by readers who are not computer scientists. Second, the visual notation is implementation-neutral: the defined structures can be encoded in any concrete implementation language--not just XML, but C or Java as well. Third, UML is a de facto industry standard that is documented in many sources. Readers are therefore more likely to be familiar with it than other notations.

Our notation and our approach for mapping UML to XML Schemas is explained in a separate document (Hucka, 2000). A summary of the essential points is presented in Appendix A, and examples throughout this document illustrate the approach. We also follow certain naming and typographical conventions throughout this document. Specifically, the names of data structure attributes or fields begin with a lowercase letter, and the names of data structures and types begin with an uppercase letter. Keywords (names of types, XML elements, etc.) are written in a typewriter-style font; for example, Compartment is a type name and compartment is a field name. Likewise, literal XML examples are also written in a typewriter-style font.


2 Overview of SBML

r2.5 in

\includegraphics[scale = 0.85]{figs/example-network}
The example on the right is a simple, hypothetical network of biochemical reactions that can be represented in SBML. Broken down into its constituents, this model contains a number of components: reactant species, product species, reactions, rate laws, and parameters in the rate laws.

To analyze or simulate this network, additional components must be made explicit, including compartments for the species, and units on the various quantities. The top level of an SBML model definition simply consists of lists of these components:


        beginning of model definition

list of unit definitions
list of compartments
list of species
list of parameters
list of rules
list of reactions
end of model definition
The meaning of each component is as follows:
Unit definition: A name for a unit used in the expression of quantities in a model. Units may be supplied in a number of contexts in an SBML model, and it is convenient to have a facility for both setting default units and for allowing combinations of units to be given abbreviated names.

Compartment: A container of finite volume for substances. In SBML Level 1, a compartment is primarily a topological structure with a volume but no geometric qualities.

Species: A substance or entity that takes part in a reaction. Some example species are ions such as Ca$ ^{2+}$ and molecules such as glucose or ATP. The primary qualities associated with a species in SBML Level 1 are its initial amount and the compartment in which it is located.

Reaction: A statement describing some transformation, transport or binding process that can change the amount of one or more species. For example, a reaction may describe how certain entities (reactants) are transformed into certain other entities (products). Reactions have associated rate laws describing how quickly they take place.

Parameter: A quantity that has a symbolic name. SBML Level 1 provides the ability to define parameters that are global to a model as well as parameters that are local to a single reaction.

Rule: In SBML, a mathematical expression that is added to the differential equations constructed from the set of reactions and can be used to set parameter values, establish constraints between quantities, etc.

A software package can read in a model expressed in SBML and translate it into its own internal format for model analysis. For instance, a package might provide the ability to simulate a model by constructing a set of differential equations representing the network and then performing numerical integration on the equations to explore the model's dynamic behavior.

SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML.

In the sections that follow, the various constructs in SBML and their uses are described in detail. Section 3 first introduces a few basic structures that are used throughout SBML, then Section 4 provides details on each of the main components of SBML. Section 5 provides several complete examples of models encoded in XML using SBML.


3 Preliminary Definitions

This section covers certain constructs that are used repeatedly in the rest of SBML and are useful to discuss before diving into the details of the components provided in SBML.


3.1 Type SBase

Each of the main components composing an SBML model definition has a specific data type that is derived directly or indirectly from a single abstract type called SBase. This inheritance hierarchy is depicted in Figure 1.

Figure 1: A UML diagram of the inheritance hierarchy of major data types in SBML. Open arrows indicate inheritance, pointing from inheritors to their parents (Oestereich, 1999; Eriksson and Penker, 1998).
\includegraphics[scale = 0.7]{figs/top-level}

The type SBase is designed to allow a modeler or a software package to attach information to each component in an SBML model. The definition of SBase is presented in Figure 2. SBase contains two fields, both of which are optional: notes and annotation. The field notes is a container for XHTML content. It is intended for recording optional user-visible annotations. Every data object derived directly or indirectly from type SBase can have a separate value for notes, allowing users considerable freedom for annotating their models. The second field, annotation, is provided for software-generated annotations. It is a container for arbitrary data (XML type any) and is intended to store information not intended for human viewing. As with the user-visible notes field, every data object can have its own annotation value.

Figure 2: The definition of SBase. Text enclosed in braces next to attribute types (i.e., {minOccurs="1"}) indicates constraints on the possible attribute values; we use XML Schema language to express constraints since we are primarily interested in the XML encoding of SBML.
\includegraphics[scale = 0.65]{figs/identified}

The Version 1 specification of SBML Level 1 was inconsistent about the spelling of the annotation field. It named the field annotation in Figure 2, but used annotations (i.e., plural) in the discussions throughout the document. SBML Level 1 Version 2 clarifies that annotation (singular) is the intended name.

In other type definitions presented below, we follow the UML convention of eliding the attributes derived from a parent type such as SBase. It should be kept in mind that these attributes are always available.


3.2 Guidelines for the Use of the annotation Field in SBase

The annotation field in the definition of SBase is formally unconstrained in order that software developers may attach any information they need to different components in an SBML model. However, it is important that this facility not be misused accidentally. In particular, it is critical that information essential to a model definition is not stored in annotation. Parameter values, functional dependencies between model components, etc., should not be recorded as annotations.

Here are examples of the kinds of data that may be appropriately stored in annotation: (a) Information about graphical layout of model components; (b) application-specific processing instructions that do not change the essence of a model; (c) bibliographic information pertaining to a given model; and (d) identification information for cross-referencing components in a model with items in a database. (We expect to introduce an explicit scheme for recording bibliographic information and making database references in a higher level of SBML, at which time using annotations for these purposes will become unnecessary.)

Different applications may use XML Namespaces (Bray et al., 1999) to specify the intended vocabulary of a particular annotation. Here is an example of this kind of usage. Suppose that a particular application wants to annotate data structures in an SBML model definition with screen layout information and a time stamp. The application developers should choose a URI (Universal Resource Identifier; Harold and Means 2001; W3C 2000a) reference that uniquely identifies the vocabulary that the application will use for such annotations, and a prefix string to be used in the annotations. For illustration purposes, let us say the URI reference is ``http://www.mysim.org/ns'' and the prefix is mysim. An example of an annotation might then be as follows:


\begin{example}
...
<annotation xmlns:mysim=''http://www.mysim.org/ns''>
<mysim...
...timestamp>2000-12-18 18:31 PST</mysim:timestamp>
</annotation>
...
\end{example}

The namespace prefix mysim is used to qualify the XML elements mysim:nodecolors and mysim:timestamp; presumably these symbols have meaning to the application. This example places the XML Namespace information on annotation itself rather than on a higher-level enclosing construct or the enclosing document level, but other placements would be valid as well (Bray et al., 1999).

The use of XML Namespaces permits multiple applications to place annotations on XML elements of a model without risking interference or element name collisions. Annotations stored by different simulation packages can therefore coexist in the same model definition. Although XML Namespace names (``http://www.mysim.org/'' in the example above) must be URIs references, an XML Namespace name is not required to be directly usable in the sense of identifying an actual, retrieval document or resource on the Internet (Bray et al., 1999). The name is simply intended to enable unique identification of constructs, and using URIs is a common and simple way of creating a unique name string. For the convenience of developers of simulation and analysis tools, we reserve certain namespace names for use with annotations in SBML. These reserved names are listed in Table 1.


Table: Reserved XML Namespace names in SBML Level 1 Version 2.
http://www.sbml.org/2001/ns/basis http://www.sbml.org/2001/ns/jdesigner
http://www.sbml.org/2001/ns/biocharon http://www.sbml.org/2001/ns/jigcell
http://www.sbml.org/2001/ns/bioreactor http://www.sbml.org/2001/ns/jsim
http://www.sbml.org/2001/ns/biosketchpad http://www.sbml.org/2001/ns/libsbml
http://www.sbml.org/2001/ns/biospice http://www.sbml.org/2001/ns/mathsbml
http://www.sbml.org/2001/ns/cellerator http://www.sbml.org/2001/ns/mcell
http://www.sbml.org/2001/ns/copasi http://www.sbml.org/2001/ns/netbuilder
http://www.sbml.org/2001/ns/cytoscape http://www.sbml.org/2001/ns/pathdb
http://www.sbml.org/2001/ns/dbsolve http://www.sbml.org/2001/ns/promot
http://www.sbml.org/2001/ns/ecell http://www.sbml.org/2001/ns/sbedit
http://www.sbml.org/2001/ns/gepasi http://www.sbml.org/2001/ns/sigpath
http://www.sbml.org/2001/ns/isys http://www.sbml.org/2001/ns/stochsim
http://www.sbml.org/2001/ns/jarnac http://www.sbml.org/2001/ns/vcell
   


Note that the namespaces being referred to here are XML Namespaces specifically in the context of the annotation field on SBase. The namespace issue here is unrelated to the namespaces discussed in Section 3.4 in the context of SName and symbols in SBML.


3.3 Type SName

The type SName is used in many places in SBML for expressing names of components in a model. SName is is a data type derived from the basic XML type string, but with restrictions about the types of characters permitted and the sequence in which they may appear. Its definition is shown in Figure 3.

Figure: The definition of the type SName, expressed in the variant of Extended Backus-Naur Form (EBNF) used by the XML 1.0 specification (Bray et al., 2000). The characters ( and ) are used for grouping, and the character * signifies ``zero or more times'' the immediately-preceding term.
  letter   ::= 'a'..'z','A'..'Z'
  digit    ::= '0'..'9'
  name     ::= ( letter | '_' ) ( letter | digit | '_' )*

The need to define a constrained data type for names stems from the fact that many existing simulation packages allow only a limited set of characters in symbol names. SBML codifies this limitation in the form of a lowest-common-denominator data type (SName), to prevent the creation of models with symbol names that might confuse some simulation software packages. This is important for facilitating model exchange between tools.


3.4 Component Names and Namespaces in SBML

A biochemical network model can contain a large number of named components representing different parts of a model. This leads to a problem in deciding the scope of a symbol: in what contexts does a given symbol X represent the same thing? The approaches used in existing simulation packages tend to fall into two categories that we may call global and local. The global approach places all symbols into a single global namespace, so that a symbol X represents the same thing wherever it appears in a given model definition. The local approach places symbols in different namespaces depending on the context, where the context may be, for example, individual rate laws. The latter approach means that a user may use the same symbol X in different rate laws and have each instance represent a different quantity. The fact that different simulation programs may use different rules for name resolution poses a problem for the exchange of models between simulation tools. Without careful consideration, a model written out in SBML format by one program may be misinterpreted by another program. SBML must therefore include a specific set of rules for treating symbols and namespaces.

The namespace rules in SBML Level 1 are relatively straightforward and are intended to avoid this problem with a minimum of requirements on the implementation of software tools:


Table 2: The reserved names in SBML Level 1.
abs cos hillr massr pow tan ucii umai usii uur
acos exp isouur not ppbr time ucir umar usir volume
and floor log or sin uai ucti umi uuci xor
asin hilli log10 ordbbr sqr uaii uctr umr uucr
atan hillmmr mass ordbur sqrt ualii uhmi unii uuhr
ceil hillmr massi ordubr substance uar uhmr unir uui


The set of rules above can enable software packages using either local or global namespaces to exchange SBML model definitions. In particular, software environments using local namespaces internally should be able to accept SBML model definitions without needing to change component names. Environments using a global namespace internally can perform a simple manipulation of the names of elements within reaction definitions to avoid name collisions. (An example approach for the latter would be the following: when receiving an SBML-encoded model, prefix each name inside each reaction with a string constructed from the reaction's name; when writing an SBML-encoded model, strip off the prefix.)

The namespace rules described here provide a clean transition path to future levels of SBML, when submodels are introduced (Section 6.1). Submodels will provide the ability to compose one model from a collection of other models. This capability will have to be built on top of SBML Level 1's namespace organization. A straightforward approach to handling namespaces is to make each submodel's space be private. The rules governing namespaces within a submodel can simply be the Level 1 namespace rule described here, with each submodel having its own (to itself, global) namespace.


3.5 Formulas

Formulas in SBML Level 1 are expressed in text string form. They are used in the definitions of kinetic laws (Section 4.7.2) and in rules (Section 4.6). The formula strings are interpreted as expressions that evaluate to a floating-point value of type double. The formula strings may contain operators, function calls, symbols, and white space characters. The allowable white space characters are tab and space. Table 3 presents the precedence rules for the different entities that may appear in formula strings. All operators in formulas return double values.


Table 3: A table of the expression operators available in SBML. In the Class column, ``operand'' implies the construct is an operand, ``prefix'' implies the operation is applied to the following arguments, ``unary'' implies there is one argument, and ``binary'' implies there are two arguments. The values in the Precedence column show how the order of different types of operation are determined. For example, the expression $ a * b + c$ is evaluated as $ (a * b) + c$ because the * operator has higher precedence. The Associates column shows how the order of similar precedence operations is determined; for example, $ a - b + c$ is evaluated as $ (a - b) + c$ because the $ +$ and $ -$ operators are left-associative. The precedence and associativity rules are taken from the C programming language (Kernighan and Ritchie, 1988; Harbison and Steele, 1995), except for the symbol ^, which is used in C for a different purpose.
Tokens Operation Class Precedence Associates
name symbol reference operand 6 n/a
(expression) expression grouping operand 6 n/a
f(...) function call prefix 6 left
- negation unary 5 right
^ power binary 4 left
* multiplication binary 3 left
/ division binary 3 left
+ addition binary 2 left
- subtraction binary 2 left
, argument delimiter binary 1 left
         


The function call syntax consists of a function name, followed by optional white space, followed by an opening parenthesis token (`('), followed by a sequence of zero or more arguments separated by commas (with each comma optionally preceded and/or followed by zero or more white space characters), followed by a closing parenthesis (`)') token. The function name must be chosen from one of the functions available in SBML. Table 6 in Appendix C lists the basic mathematical functions that are defined in SBML at this time, while Table 7 lists a large number of common rate law functions defined in SBML. The names of these predefined functions are reserved and make up the bulk of the list of names in Table 2.

A program parsing a formula in an SBML model should assume that name tokens other than function names are names of parameters, compartments or species. When a species name occurs in a formula, it represents the concentration (i.e., \bgroup\color{BrickRed}$ substance/volume$\egroup) of the species. When a compartment name occurs in a formula, it represents the volume of the compartment. The units of substance and volume are determined from the built-in substance and volume of Table 5.

Readers may wonder why mathematical formulas in SBML are not expressed using MathML (W3C, 2000b), an XML-based mathematical formula language. Although using MathML would be more in the spirit of using XML and would in some ways be a more forward-looking choice, it would require simulation software to use fairly complex parsers to read and write the resulting SBML. Most contemporary systems biology simulation software simply represent mathematical formulas using text strings. To keep SBML Level 1 simple and compatible with known simulation software, we chose to represent formulas as strings. This does not preclude a later level of SBML from introducing the ability to use MathML as an extension.


4 SBML Components

In this section, we define each of the major data structures in SBML. To provide illustrations of their use, we give partial XML encodings of SBML model components, but we leave full XML examples to Section 5.


4.1 Models

The Model structure is the highest-level construct in an SBML data stream or document. The UML definition of Model is shown in Figure 4. Only one component of type Model is allowed per instance of an SBML document or data stream, although it does not necessarily need to represent a single biological entity.

Figure 4: The definition of Model. Additional fields are inherited from SBase.
\includegraphics[scale = 0.65]{figs/model}

Model serves as a container for UnitDefinition, Compartment, Species, Parameter, Rule, and Reaction components. All of these components are optional; that is, the lists in each of the respective fields are permitted to have zero length. (However, there are dependencies between components, such that defining some requires defining others. See in particular Section 4.4 on Species.) An instance of a Model may also have an optional name field that can be used to give the model a name. The name must be a text string conforming to the syntax permitted by the SName data type described in Section 3.3.

In the XML encoding of an SBML model, the lists of species, compartments, unit definitions, parameters, reactions, function definitions and rules are translated into lists of XML elements that each have headings of the form listOfs, where the blank is replaced by the name of the component type (e.g., ``Reaction''). The resulting XML data object has the form illustrated by the following skeletal model:


\begin{example}
<?xml version=''1.0'' encoding=''UTF-8''?>
<sbml xmlns=''http://...
...les>
<listOfReactions>
...
</listOfReactions>
</model>
</sbml>
\end{example}

Readers may wonder about the motivations for the listOfs notation. A simpler approach to creating the lists of components would be to place them all directly at the top level under <model> ... </model>. We chose instead to group them within XML elements named after listOfs, because we believe this helps organize the components and makes visual reading of model definitions easier.


4.2 Unit Definitions

Units may be supplied in a number of contexts in an SBML model. A facility for defining units is convenient to have so that combinations of units can be given abbreviated names. This is the motivation behind the UnitDefinition data structure, whose definition is shown in Figure 5.

Figure 5: The definition of UnitDefinition.
\includegraphics[scale = 0.65]{figs/unitdefinition}

A unit definition consists of a name field of type SName and an optional list of structures of type Unit. The approach to defining units in SBML is compositional; for example, \bgroup\color{BrickRed}$ meter\
second^{\,-2}$\egroup is constructed by combining a Unit-type element representing \bgroup\color{BrickRed}$ meter$\egroup with a Unit-type element representing \bgroup\color{BrickRed}$ second^{\,-2}$\egroup. The Unit structure has one required attribute, kind, whose value must be a name taken from the list of units in Table 4. The optional exponent field on Unit represents an exponent on the unit. Its default value is ``1'' (one). In the example just mentioned, \bgroup\color{BrickRed}$ second^{\,-2}$\egroup is obtained by using kind="second" and exponent="-2". Finally, a Unit structure also has an optional scale field; its value must be an integer exponent on a power of ten multiplier used to set the scale of the unit.For example, a unit that has a kind value of ``gram'' and a scale value of ``-3'' signifies \bgroup\color{BrickRed}$ 10^{-3} * gram$\egroup, or milligrams. The default value of scale is zero, because $ 10^0 = 1$.


Table: The possible values of kind in a UnitKind structure. All are names of base or derived SI units, except for ``dimensionless'' and ``item'', which are SBML additions important for handling certain common cases. ``Dimensionless'' is intended for cases where a quantity does not have units, and ``item'' is needed in certain contexts to express such things as ``N items'' (e.g., ``100 molecules''). Although ``Celsius'' should be capitalized, for simplicity SBML requires that all unit names be treated in a case-insensitive manner. Also, note that the gram and liter/litre are not strictly part of SI (Bureau International des Poids et Mesures, 2000); however, they are so commonly used in SBML's areas of application that they are included as predefined unit names. (The standard SI unit of mass is in fact the kilogram, and volume is defined in terms of cubic meters.)
ampere farad joule lumen ohm steradian
becquerel gram katal lux pascal tesla
candela gray kelvin meter radian volt
celsius henry kilogram metre second watt
coulomb hertz liter mole siemens weber
dimensionless item litre newton sievert


Unit combinations are constructed by listing several Unit structures inside a UnitDefinition-type structure. The following example illustrates the definition of an abbreviation named ``mmls'' for the units \bgroup\color{BrickRed}$ mmol\ l^{-1}\ s^{-1}$\egroup:


\begin{example}
<listOfUnitDefinitions>
<unitDefinition name=''mmls''>
<listOf...
...1''/>
</listOfUnits>
</unitDefinition>
</listOfUnitDefinitions>
\end{example}

There are three special unit names in SBML, listed in Table 5, corresponding to the three types of quantities that play roles in biochemical reactions: amount of substance, volume and time. SBML defines default units for these quantities, all with a default scale value of 0. The various components of a model, such as parameters, can use only the predefined units from Table 4, new units defined in unit definitions, or the three predefined names ``substance'', ``time'', and ``volume'' from Table 5. The latter usage signifies that the units to be used should be the designated defaults.


Table 5: SBML's built-in quantities. Each of these units has a default scale value of 0.
Name Allowable Units Default Units
substance moles or number of molecules moles
volume liters liters
time seconds seconds


A model may change the default scales by reassigning the special unit names ``substance'', ``time'', and ``volume'' in a unit definition. This takes advantage of the UnitDefinition structure's facility for defining scales on units. The following example changes the default units of volume to be milliliters:


\begin{example}
<model>
...
<listOfUnitDefinitions>
<unitDefinition name=''vo...
...Units>
</unitDefinition>
</listOfUnitDefinitions>
...
</model>
\end{example}

If the definition above appeared in a model, the volume scale on all components that did not explicitly use different units would be changed to milliliters.


4.3 Compartments

A compartment in SBML represents a bounded volume in which species are located. Compartments do not necessarily have to correspond to actual structures inside or outside of a cell, although models are often designed that way. The definition of Compartment is shown in Figure 6.

Figure 6: The definition of Compartment. Fields inherited from SBase are omitted here but are assumed.
\includegraphics[scale = 0.65]{figs/compartment}

Compartment has one required field, name, to give it a unique name by which other parts of an SBML model definition can refer to it. A compartment can also have an optional floating-point field called volume representing the total volume of the compartment. This enables concentrations of species to be calculated in the absence of spatial geometry information.The volume attribute defaults to a value of ``1'' (one). The units of volume may be explicitly set using the optional field units. The value of this attribute must be one of the following: a predefined unit name from Table 4, the term ``volume'' (which, if used, signifies that the default units of volume should be used--see Section 4.2), or the name of a unit defined by a unit definition in the Model. If absent, as in the example above, the units default to the value set by the built-in ``volume''.

The optional field outside of type SName can be used to express containment relationships between compartments. If present, the value of outside for a given compartment must be the name of another compartment enclosing it, or in other words, the compartment that is ``outside'' of it. This enables the representation of simple topological relationships between compartments, for those simulation systems that can make use of the information (e.g., for drawing simple diagrams of compartments). Although containment relationships are partly taken into account by the compartmental localization of reactants and products, it is not always possible to determine purely from the reaction equations whether one compartment is meant to be located within another. In the absence of a value for outside, compartment definitions in SBML Level 1 do not have any implied spatial relationships between each other.

In an XML data stream containing an SBML model, compartments are listed inside an XML element called listOfCompartments within a Model-type data structure. (See the discussion of Model in Section 4.1.) The following example illustrates two compartments in an abbreviated SBML example of a model definition:
\begin{example}
<model>
...
<listOfCompartments>
<compartment name=''cytosol'...
...tochondria'' volume=''0.3''/>
</listOfCompartments>
...
</model>
\end{example}

The following is an example of using outside to model a cell membrane. To express that a compartment named B has a membrane that is modeled as another compartment M, which in turn is located within another compartment A, one would write:
\begin{example}
<model>
...
<listOfCompartments>
<compartment name=''A''/>
<...
...nt name=''B'' outside=''M''/>
</listOfCompartments>
...
</model>
\end{example}


4.4 Species

The term species refers to entities that take part in reactions. These include simple ions (e.g., protons, calcium), simple molecules (e.g., glucose, ATP), and large molecules (e.g., RNA, polysaccharides, and proteins). The Species data structure is intended to represent these entities. Its definition is shown in Figure 7.

Figure 7: The definition of Species. As usual, fields inherited from SBase are omitted here but are assumed.


\includegraphics[scale = 0.65]{figs/species}

Species has a required name field of type SName. The required field compartment, also of type SName, is used to identify the compartment in which the species is located. The field initialAmount, of type double, is used to set the initial amount of the species in the named compartment. The units of this quantity may be set explicitly using the optional field units. The value of units must be chosen from one of the following possibilities: a predefined unit name from Table 4, the term ``substance'' (which, if present, signifies that the default units of quantity should be used--see Section 4.2), or a new unit name defined by a unit definition in the enclosing Model. If absent, the units default to the value set by the built-in ``substance''.

The optional boolean field boundaryCondition determines whether the amount of the species is fixed or variable over the course of a simulation. The value of boundaryCondition defaults to ``false'', indicating that by default, the amount is not fixed. If the amount of a species is defined as being fixed, it implies that some external mechanism maintains a constant quantity in the compartment throughout the course of a reaction. (The term boundary condition alludes to the role of this constraint in a simulation.)

The optional field charge is an integer indicating the charge on the species (in terms of electrons, not the SI unit Coulombs). This may be useful when the species involved is a charged ion such as calcium ( Ca$ ^{2+}$).

The following example shows two species definitions within an abbreviated SBML model definition. The example shows that species are listed under the heading listOfSpecies in the model:
\begin{example}
<model>
...
<listOfSpecies>
<species name=''Glucose'' compart...
...=''cell'' initialAmount=''0.75''/>
</listOfSpecies>
...
</model>
\end{example}

In SBML Level 1 Version 2, the term specie (used in SBML Level 1 Version 1) has been replaced with the more commonly-accepted spelling species throughout the specification. Models written in SBML Level 1 Version 2 format should use the new spelling. However, for backwards compatibility, software packages intended to be conformant with SBML Level 1 Version 2 should accept both spellings on input for all elements and attributes where the term occurs. Beginning with SBML Level 2, the specie spelling will be removed entirely and only species will be used.

Finally, note that the definition of Species in SBML requires a species in a model to be located within a compartment. This means that at least one compartment must be defined in an SBML model that defines any species. The only exception to this is the case of degenerate models that have no species or reactions.




4.5 Parameters

A Parameter structure is used to associate a name with a floating-point value so that the symbol can be used in formulas in place of the value. The definition of Parameter is shown in Figure 8.

Figure 8: The definition of Parameter.


\includegraphics[scale = 0.65]{figs/parameter}

The Parameter structure has one required field, name, representing the parameter's name in the model. The optional field value determines the value (of type double) assigned to the symbol. The units of the parameter value are specified by the field units. The value assigned to units must be chosen from one of the following possibilities: one of the base unit names from Table 4; one of the three names ``substance'', ``time'', or ``volume'' (see Table 5); or the name of a new unit defined in the list of unit definitions in the enclosing Model structure.

Parameters can be defined in two places in SBML: in lists of parameters defined at the top level in a Model-type structure (in the listOfParameters described in Section 4.1), and within individual reaction definitions (as described in Section 4.7). Parameters defined at the top level are global to the whole model; parameters that are defined within a reaction are local to the particular reaction and (within that reaction) override any global parameters having the same names. (See Section 3.4 for further details.)

The following is an example of parameters defined at the Model level:


\begin{example}
<model>
...
<listOfSpecies>
...
</listOfSpecies>
<listOfPar...
...ameters>
<listOfReactions>
...
</listOfReactions>
...
</model>
\end{example}

An example of a full model that uses parameters is presented in Section 5.3.


4.6 Rules

In SBML, rules provide a way to create constraints on variables for cases in which the constraints cannot be expressed using reactions (Section 4.7) nor the assignment of an initial value to a component in a model. There are two orthogonal dimensions by which rules can be described. First, there are three different possible functional forms, corresponding to the following three general cases (where $ x$ is a variable, $ f$ is some arbitrary function, and $ W$ is a vector of parameters and variables that may include $ x$):

(Algebraic rule) left-hand side is zero: \bgroup\color{BrickRed}$ 0 = f(W)$\egroup
(Scalar rule) left-hand side is a scalar: \bgroup\color{BrickRed}$ x = f(W)$\egroup
(Rate rule) left-hand side is a rate-of-change: \bgroup\color{BrickRed}$ dx/dt = f(W)$\egroup

The second dimension concerns the role of variable \bgroup\color{BrickRed}$ x$\egroup in the equations above: \bgroup\color{BrickRed}$ x$\egroup can be the name of a compartment (to set its volume), the name of a species (to set its concentration), or a parameter name (to set its value).

In their general form given above, there is little to distinguish between scalar and algebraic rules. They are treated as separate cases for the following reasons:

The approach taken to covering these cases in SBML is to define an abstract Rule structure that contains just one field, formula, to hold the right-hand side expression, then to derive subtypes of Rule that add fields to cover the various cases above. Figure 9 gives the definitions of Rule and the subtypes derived from it. The figure shows that AlgebraicRule is defined directly from Rule, whereas CompartmentVolumeRule, SpeciesConcentrationRule, and ParameterRule are all derived from an intermediate abstract structure called AssignmentRule.

Figure 9: The definition of Rule and derived types.

\includegraphics[scale = 0.65]{figs/rule}

The type field introduced in AssignmentRule is an enumeration of type RuleType that determines whether a rule falls into the scalar or rate categories in the list of cases above. In SBML Level 1, the enumeration has two possible values: ``scalar'' and ``rate''. The former means that the expression has a scalar value on the left-hand side [i.e., $ x =
f(W)$, as in case 2 in the list above]; the latter means that the expression has a rate of change differential on the left-hand side [i.e., $ dx/dt = f(X)$, as in case 3 in the list above]. Future releases of SBML may add to the possible values of RuleType.

4.6.1 AlgebraicRule

The rule type AlgebraicRule is used to express equations whose left-hand sides are zero. AlgebraicRule does not add any fields to the basic Rule; its role is simply to distinguish this case from the other cases.

4.6.2 SpeciesConcentrationRule