signaling gateway home
registrationelectronic alerthelpcontact ussite guidesearch
spacer

AfCS > Program Summary

I. PROGRAM SUMMARY

A. GENERAL GOALS AND STATEMENT OF PURPOSE

1. Background

Five years ago we imagined that technical advances in biology would eventually make it possible to construct a virtual cell. Spurred on by this long-range vision, we proposed establishment of the Alliance for Cellular Signaling (AfCS) to take comprehensive advantage of these advances, including the availability of complete genomic sequences and expanding capabilities to query the entire genome, manipulate gene expression, and quantify activities of macro-molecules in vivo. The AfCS has acquired other enabling technologies (e.g., low-cost, internet-based video communication and collaboration) and procured adequate funding from a consortium of NIH institutes, pharmaceutical companies, and philanthropists. In its four years of operation the AfCS has taken the first exciting steps toward its imagined goal.

The AfCS at present represents a unique collaboration among 27 participating investigators at 16 academic sites, employs a staff of approximately 100 Ph.D. scientists, technicians, and bioinformatics experts, and interacts productively with several commercial enterprises. AfCS experiments are exclusively carried out in seven dedicated AfCS experimental laboratories and not in the individual labs of our participating investigators.

Two significant changes from our initial experimental plan have allowed the AfCS to make impressive headway. First, experiments now focus exclusively on the cultured RAW 264.7 macrophage, rather than on splenic B cells and cultured cardiac myocytes as proposed originally (see below). This change allows us to acquire, under highly standardized conditions, the vast amounts and rich kinds of data required for detailed quantitative analysis of cellular signaling. Second, experiments deliberately focus on large but well-defined subsets of this cell's signaling network. Together, these changes have allowed us to begin to answer the critical, over-arching questions we posed in the initial AfCS proposal, five years ago. The questions include:

Changes in our experimental focus have been facilitated by significant changes in AfCS governance (see Administrative Management Plan). As before, direction of AfCS programs is the responsibility of a Steering Committee, which works in conjunction with a System (Macrophage) Committee that provides more specific direction to the laboratories, a Data Analysis Group, an Editorial Board, and nine dedicated AfCS Laboratories. Members of the initial Steering Committee were effective when the effort was launched, but several were unable to track AfCS activities on a day-to-day basis. Many of those now serve on the External Advisory Committee, while the present Steering Committee comprises a cohesive group of highly interactive individuals, each intimately familiar with details of AfCS endeavors; representatives of our sponsors also serve as valuable members of this group. Two System Committees (B Cell and Myocyte) are now one (Macrophage). We have created a Data Analysis Group to meet the rapidly growing challenges, both statistical and conceptual, of analyzing copious data. We are currently establishing a strong interface between data analysts and those involved in our increasingly intense effort to model quantitatively the behavior of RAW cell signaling systems.

2. Progress

In the interest of coherence, we provide a brief overview here. Overall and detailed Progress Reports are presented elsewhere (Section II). This section is admittedly redundant, but we believe that its placement here will add significantly to understanding of our overall discussion of goals and strategies (section I.C, below).

a. Choice of a New Cell Type for Study. By far the most significant change from our original plan has been the shift to studying a cultured macrophage cell line. We appreciated that primary B lymphocytes and cardiac myocytes would pose significant experimental difficulties, but thought that the potential relevance of highly differentiated, relatively "normal" cells made the risks worthwhile. We also adopted a cultured B cell line, WEHI 231, to complement the resting splenic B cell and permit manipulation of gene expression in ways that would not be possible with normal B cells, which survive in culture only briefly. The primary cells did pose many foreseeable problems (e.g., effort and expense of cell preparation, variability of responses), but ironically were discarded for a quite different and completely unforeseen reason: the complete refractoriness of WEHI 231 cells (and other B cell lines) to manipulation of gene expression with RNA interference (RNAi); this is a technology that we could only imagine when the project began. We thus decided in May 2003, in conjunction with our External Advisory Committee and sponsors, to forego the pleasures of primary cells. Eight mouse cell lines were tested and scored against criteria for a suitable cell system. Work on RAW 264.7 macrophages began in earnest during the latter half of 2003. Impressive progress with these cells during the past year shows that the time spent honing our skills on lymphocytes and myocytes was not wasted.

b. The RAW 264.7 Macrophage. These mouse cells are easily propagated in monolayer culture with a doubling time of under 20 hrs; they are diploid and chromosome number is stable. They express receptors for a large number of ligands, including many G protein-coupled receptors, interleukin receptors, Toll-like receptors, tyrosine kinase receptors, and others. Responses to ligands are sufficiently stable in long-term culture to define a workable window during which observations are reproducible. The cells can be transfected with DNA with 30-50% efficiency and — crucially — are susceptible to RNA interference. Morphology is suitable for high-resolution microscopy. Downstream, differentiated outputs of RAW cells include cytokine synthesis and secretion, macropinocytosis, phagocytosis, and chemotaxis. It is indeed challenging to find all desirable properties in a single cell; RAW cells approximate the ideal reasonably well.

c. Complexity of Signal Processing. Our first major goal was to perform broad assays to detect patterns of response to individual signaling inputs and interactions between combinations of inputs (single and double ligand screen). We have identified 22 ligands that cause reproducible, receptor-mediated effects on RAW cells and have nearly completed characterizing the effects of these ligands and their 231 possible pairwise combinations on Ca2+ and cyclic AMP concentrations, phosphorylation of 21 cellular signaling proteins, and secretion of 18 cytokines. Less comprehensive analysis has also been accomplished by examining changes in thousands of cellular mRNA transcripts and approximately 400 glycerophospholipids. Significant single and double ligand screens were also accomplished with splenic B cells. Results in both RAW and B cells reveal impressive interactions among responses to different ligands: that is, relatively high incidences of inhibitory or greater than additive responses for many assay end-points.

d. Structure of the Signaling Network. The AfCS Protein List, including nearly 4000 proteins of interest in cellular signaling, serves as the foundation of our Molecule Page Database. In turn, this database links to many other public data repositories and is being populated with detailed, standardized literature-based information entered by expert authors. AfCS data on expression and subcellular localization of these proteins in RAW cells is also being incorporated to constitute a comprehensive RAW Cell Parts List. Connectivity of the network is being examined by yeast two-hybrid analysis and by perturbations with drugs and RNAi.

In these and many other endeavors (see below), the AfCS has begun to focus detailed effort on conceptually and experimentally manageable signaling sub-networks, designated as part of the so-called FXM Project (FXM = Focus on X Modules, where X initially = Ca2+ and PIP3); initially, the FXM is measuring Ca2+ and PIP3 responses to only three ligands (C5a, UDP, and IgG2a). Detailed conceptual and computational maps of signaling pathways leading to Ca2+ and PIP3 have been constructed, and expression of relevant proteins in RAW cells has been approached using microarrays, RT-PCR, and immunoblotting. AfCS laboratories have devised efficient procedures for perturbing RAW 264.7 cells with virally-encoded short hairpin constructs (shRNA) to accomplish RNAi, which are already progressing at a rate of four knockdowns assessed per week. Thus within a year we should be able to knock down most of the ~200 signaling proteins included in current FXM modules. As the project proceeds we anticipate expanding this effort to include many hundreds of signaling proteins.

A variety of functional assays have been applied to the ligand screens, and additional assays are being added as part of the FXM. We are paying increasing attention to assaying responses in single cells — an effort that will need to be substantially expanded.

e. Contributions to the Research Community. We have responded to this important part of our charge with an extensive website (www.signaling-gateway.org), maintained in collaboration with the Nature Publishing Group. The site presents AfCS data (including images), Molecule Pages, an antibody database, protocols, research reports, a newsletter, and access to more than 4000 reagents via a plasmid database and the American Type Culture Collection; additional useful content is incorporated by Nature.

3. Current Major Challenges

a. Digest the Ligand Screen Data to Define More Precisely Our Sphere of Interest. Goals of the ligand screen were to describe input/output relationships of the cell of interest and the patterns of response available to it and to identify combinations of ligands that produce non-additive responses. Non-additive interactions identify potential intersections between signaling modules, providing critical guidance for developing assays that will define network dynamics. The single and double ligand screen in RAW cells is nearly complete. To direct subsequent, more detailed experimental efforts (see below), we need to analyze carefully the large number of apparent non-additive responses already identified. The protein phosphorylation component of the screen will not be complete until late 2004. Last, we must evaluate how this very large but descriptive data set can be used to give information on the signaling network that underlies the many and complex interactions that we have discovered.

b. Advance the FXM Project. The FXM’s goal is to dissect in detail a specified portion of the signaling network. We are learning how to walk before we run and how to integrate AfCS technologies and modes of analysis. Our capabilities here must grow to permit analysis of network connectivity and network dynamics. With respect to RNAi perturbations, we must first satisfy ourselves and others that the observed phenotypes are reproducible and result from either short-term disruption or longer-term adaptation of the system, rather than from off-target effects of our reagents or selection of mutant cells in the population. We must then target a more comprehensive list of FXM proteins for manipulation using RNAi; to date, more than 40 targets have been disrupted from a list of ~200. A major challenge will be to progress from analyzing connectivity to analyzing signaling dynamics. To do so, we must add to our repertoire assays that monitor information flow through multiple nodes in the pathways of interest — e.g., by assessing protein phosphorylation, intracellular translocation of proteins between compartments, and interactions between proteins (e.g., with fluorescence resonance energy transfer [FRET]).

c. Expand the FXM Project. The FXM serves as a paradigm and progenitor of future AfCS efforts to understand and model larger portions of the signaling network. In choosing such portions for study, we shall concentrate for the most part on responses that show non-additive interactions between ligands. We plan to allow the FXM to "ooze" gradually outwards — first by subjecting the accumulating number of FXM shRNA knockdown cell lines to additional assays with ligands not initially included in the FXM, and later by applying the FXM approach to additional signaling modules and applying RNAi and other perturbations to proteins beyond the FXM’s present scope. As mentioned, additional assays to be developed will include visualization of translocations of and interactions between signaling proteins (largely single cell assays; fluorescence microscopy, including FRET) and quantification of protein phosphorylation (mass spectrometry and immunological approaches).

d. Expand Data Analysis and Modeling Capabilities. The enormously diverse assortment of qualitatively different varieties of data obtained from RAW 264.7 cells poses a major challenge. To meet it, we are increasing efforts to organize, parameterize, and normalize these data automatically, to facilitate their interpretation in terms of the underlying signaling network. FXM data has brought us the specific challenge of undertaking module-wide interpretation of the data with quantitative mechanistic models of information flow in RAW 264.7 cells. Using FXM data as an evolving test bed, we expect to construct a preliminary FXM model within the year. Finally, we need to begin using our modeling skills and their results to direct new experiments. We will close the experiment-analysis-hypothesis loop by using preliminary quantitative models to target new measurements and perturbations necessary for understanding information flow throughout the network.

4. Prospects

We believe that the AfCS experiment has already proven itself to be a robust success. From a pie-on-the-earth perspective, as we document in this proposal, the AfCS has identified a dense web of signaling interactions triggered by pairwise combinations of many extracellular stimuli and is rapidly dissecting connectivity of FXM signaling modules containing hundreds of proteins. The AfCS website already serves the entire scientific community as an invaluable source of useful reagents (including thousands of DNA constructs), signaling information (in a database that includes thousands of molecules), lab protocols, descriptions of reagents, and experimental data from hundreds of experiments that are accessible to and being actively mined by non-AfCS investigators. These AfCS efforts are being conducted on a scale unprecedented in previous work with vertebrate cells. In the aggregate these accomplishments reflect success of an initially risky social experiment – the attempt to create a transcontinental consortium of biological laboratories collaborating effectively to achieve a common goal. The potentially unwieldy consortium has revealed a surprising capacity to meet difficult challenges, and to turn on a dime when necessary — as in the shift from primary mouse cells to the RAW 264.7 macrophage and in implementing novel RNAi technology to produce knockdown phenotypes at an impressive rate.

What about the future? Given our proven capability to deal with both organizational and procedural challenges, we predict that the AfCS project will prove even more productive in its next five years. We shall provide additional thousands of reagents to the signaling community and produce detailed, experimentally determined connectivity maps of multiple regulatory modules, comprising thousands of signaling proteins (rather than dozens). For the first time cell biologists will know the precise complements of protein isoforms in large protein families that are responsible for defined responses of a mammalian cell. Similarly, the AfCS website will display effects of extracellular stimuli on the relative concentrations of hundreds of membrane lipids in RAW 264.7 cells. We feel sure that these predictions will prove correct. Their extraordinary value to the broader community of biologists would by itself justify renewal of the AfCS project.

In addition to providing scientists with comprehensive data difficult to obtain in any other way, the AfCS has grander, more ambitious goals. Mapping and modeling information flow through signaling networks of RAW 264.7 cells will generate and test new hypotheses. Testing hypotheses that encompass large segments of the signaling network should put us on the road to understanding general principles that underlie the topologies of signaling networks in mammalian cells. Will experimental tests of qualitative and quantitative signaling network models reveal new principles of biological regulation? We answer the question affirmatively, but with qualifications. AfCS experiments will surely teach us more about structures of signaling modules than we know at present — and almost certainly more than any non-AfCS effort will learn in the same period. At the same time, we recognize that comprehensive understanding of network structures remains a lofty goal. Even our definition of this goal is likely to change and develop during the next five years. The remarkable effectiveness and adaptability of AfCS laboratories during the first funding period furnish the strongest augury for future success in pursuing these ambitious goals.

B. SPECIFIC GOALS

We extract the following major specific goals from the list (above) of challenges. These specific goals will be discussed in more detail in the remainder of this Program Summary.

  1. Determination of Molecular Pathways
    1. Interacting Molecular Partners
    2. Network Connectivity
    3. Quantification of Proteins
    4. Location of Proteins
  2. Quantitative Determination of Information Flow
  3. Data Management and Analysis
  4. Data Modeling and Network Analysis
  5. Contributions to the Signaling Research Community

C. EXPERIMENTAL STRATEGIES

1. Determination of Molecular Pathways

a. Interacting Molecular Partners. i. Identifying the players. Our focus is on signaling events initiated at the cell surface and propagated as information flow through an intracellular network of signaling proteins, ultimately resulting in changes in cellular metabolism, morphology, or function. The first challenge is to identify and characterize the potential molecular components of these networks. This process was initiated by compiling a comprehensive list of signaling proteins from literature citations. The list thus far includes nearly 4000 signaling proteins that form the core of the growing AfCS Molecule Page database. We have recruited members of the signal transduction community who study these molecules to curate and update Molecule Pages devoted to each signaling protein. The Molecule Pages contain referenced information on the function, modification, molecular interactions, and biochemical properties of the signaling protein(s). They are archived in the AfCS database and made publicly available via the AfCS-Nature Signaling Gateway (see also Section I.C.5.b.iv, below). The data contained in the Molecule Pages is in a form that will allow them to be used in a variety of computational contexts. This database will be further enriched through associations with additional databases generated by other groups dedicated to working on signaling networks and to annotating the mouse genome (e.g., BIND, GRID, the Kitano Project, as well as the more canonical databases, e.g., GENBANK, ENSEMBLE, etc.) (see Sections II.C and VIII.C.2).

In addition to compiling a comprehensive list of potential signaling proteins, we need an inventory of the genes that are expressed and functioning within the RAW 264.7 cell line. This "parts list" of the RAW 264.7 cell is currently being developed by a bootstrap approach that involves a combination of literature search and experimental verification. In the first phase, literature citations were used to compile lists and simple maps of signaling pathways and proteins that were presumed to be present in the RAW cells. Extensive microarray data were then used to positively identify specific genes and transcript sequences that were present. RT-PCR was used to confirm the presence or absence of specific transcripts. We are currently using QRT-PCR to identify specific isoforms of signaling proteins and the various members of gene families that are expressed. This approach will be extended to all of the relevant microarray data. We will identify the transcripts that are found under normal growth conditions or induced or co-regulated when RAW cells are challenged with different ligands. These data will be analyzed statistically to provide levels of confidence regarding the presence or absence of specific transcripts. The resulting annotated parts lists will be an integral component of the Alliance database. The parts list will be compared to the results of similar inventories derived from other related cell lines and from primary macrophage populations.

We have assigned high priority to the identification of a subset of genes and proteins in RAW 264.7 cells that are thought to be part of the module that controls Ca2+ and PIP3 signaling (the FXM). Approximately two hundred such genes and proteins have been selected as potential components. They are the focus of current intense efforts to assemble a signaling circuit. The FXM subset of the database provides the basic information and literature background necessary both to prepare reagents for experimental studies and to develop rudimentary network maps that can be tested experimentally (Section I.C.b).

The process of identifying signaling proteins and components of signaling pathways is an ongoing task. While the entire genome of the mouse has been sequenced, a very significant portion of the potential open reading frames encode transcripts that generate proteins whose functions are not known. A substantial fraction of these undoubtedly play important roles in signaling. These "unassigned" genes and transcripts are represented on some of our microarrays and in our cDNA libraries. As we implicate specific transcripts in signaling processes, their annotation will be added to our growing database.

ii. Building pathways. Information processing is generally mediated by transient protein-protein interactions between specific subdomains of signaling proteins. The relative affinity of binding is modulated by: a) interactions with specific ligands; b) conformational changes; c) protein modifications; and d) molecular translocations. In addition to protein-protein interactions, the transmission of signals relies on processes that regulate the modification and transitions of signaling molecules (including non-protein molecules) from one state to another. An important tool for understanding information processing is a hypothetical signaling circuit map that incorporates the signaling molecules and describes the specific relationships between them. The maps are abstract descriptions of the temporal and spatial events that we believe to occur in the cell; they are not simply connection networks defined by possible binding partners. We initially extracted "modules" of signaling pathways from the current literature. These take the form of cartoon sketches that outline pathways showing the nature of the input signals and the roles of some of the players as deduced from current publications and from our database information. The next iteration involves assembling maps that can be used both to summarize experimental and mechanistic data and to guide formulation of hypotheses and modeling of signaling pathways. This requires that the map itself be in a form that allows data input and computation. It also requires that the map allows manipulation and evolution of pathways as data disproves or validates specific relationships. We are using the PathwayBuilder software developed by Adam Arkin to develop these maps (see Section XVI). In addition to encapsulating the nature of the physical interactions between signaling proteins, PathwayBuilder allows us to describe the relationships in terms of biochemical processes. During the next grant period we will link the map directly to the Alliance Molecule Page database such that new data and new relationships can be added. The PathwayBuilder map will allow us to examine and record the overall topology of a network, propose mechanisms for observed effects of perturbations, and compile kinetic parameters that can be used to constrain models of information flow through the network. Thus the map will act as a tool for integration of data and generation and testing of hypotheses.

iii. Screening for protein-protein interactions. While the literature provides potential signaling components and a vast body of data suggesting how they might be connected into specific pathways, it does not provide a complete picture of the extent or topology of network interactions. There are only a few methods that will allow us, in an unbiased, high throughput screen, to determine experimentally interactions between protein domains. We have chosen to use the yeast two-hybrid (Y2H) system as an experimental approach to generate data that can point to connections between proteins that may not have yet found their way into the literature. Thus, we are currently using the FXM protein sequences to "fish" from a large representative population of expressed protein domains those that interact with the bait sequences. The Y2H experiments are being done in collaboration with Myriad Genetics, Inc.; the Alliance provides candidate bait sequences and materials for library generation and Myriad uses their high throughput system to screen baits and exclude false positive interactions. Currently libraries derived from RAW 264.7 cell transcripts are being screened with subdomain baits derived from all of the FXM candidate signaling proteins. We anticipate by the middle of next year that we will have screened over 2,000 baits derived from more than 300 signaling protein sequences and we will have cataloged and verified approximately 2,000 interactions. These data are all available through the AfCS Web site and can be represented as networks of interactions using a variety of software tools (Fig. 1A). Some of the interactions found by the experimentally-based Y2H approach can be confirmed by references to the literature, and Fig. 1B shows examples of how screening the appropriate literature databases and molecule pages (including their links to other databases) has allowed us to further annotate the interactions uncovered by the yeast two-hybrid experiments. However, while the yeast two-hybrid map is useful, it also has serious limitations. Thus, it is not clear what fraction of important signaling interactions that occur at the mammalian cell membrane are captured by experiments done in yeast cells. In addition, there are many protein modifications and translocations that cannot easily be mimicked in the yeast system. On the other hand, direct methods for looking at protein complexes such as immunoprecipitation and mass spectrometry for protein identification are not easy to use in a high-throughput fashion. However, we have developed these experimental tools, and they will be applied selectively to test for critical interactions or to resolve issues raised by other experimental techniques (Section XII).

Fig 1A

Fig. 1. Visualization of the AfCS/Myriad yeast two-hybrid data using CytoScape. A.) A significant portion of the dataset can be grouped in a large interconnected network. Genes potentially involved in FXM pathways are highlighted in green. Non-FXM signaling proteins are red; those not on the AfCS protein list are purple; unknown proteins are blue. B (next page). The network from panel A is shown with "leaf" nodes and additional extraneous nodes removed to show proteins participating in multiple interactions. A selection of legacy interactions is highlighted with red connections, demonstrating a reassuring level of overlap between our findings and existing data. Moreover, our data contains hundreds of potentially novel interactions and identifies many proteins that could add another dimension to the RAW cell signaling "parts list".

Fig 1B

The literature, databases, microarray and QRT-PCR experiments, and Y2H studies provide a basis for construction of hypothetical signaling pathway maps. The experimental approaches outlined in the next two sections will test and extend the network and allow us to build models of interacting signaling modules. Highly focused experiments will ultimately be used to test specific hypotheses and predictions.

b. Network Connectivity and Perturbations. The first step toward establishing connectivity in signaling networks like those targeted by the FXM is to determine whether a protein is required for a specified cellular response to an extracellular stimulus. As mentioned, we have identified ~200 proteins likely to play signaling roles in Ca2+ and PIP3 responses to FXM ligands. Once a potential target protein is confirmed as expressed in RAW 264.7 cells (by immunoblot, mRNA array, or RT-PCR), we ask whether the Ca2+ and/or PIP3 response is affected by perturbations that reduce its function. Effects of drugs or toxins may implicate classes of functionally similar proteins in a response, while altered response in an RNAi knockdown cell implicates a specific protein isoform. Within one year AfCS laboratories will have assessed ligand responses in cell lines expressing lentivirally-expressed shRNAs targeting each of 200 FXM isoforms; results to date indicate that shRNA will reduce target protein abundance by 90% or more in 50% of the lines and by 50% or more in 80% of the lines (see Progress Report and Research Plan of the Macrophage Biology Lab, below). Lentiviral expression of an shRNA has important advantages over siRNA: co-expression of the shRNA with a selectable marker makes it possible to obtain a homogeneous knockdown line that remains relatively stable for many weeks and can be stored by freezing.

Here we present an overview of FXM experiments whose details are presented in the Progress Report and Research Plan. Our ability to knock down specific isoforms creates three sets of challenges and opportunities. First, how and to what degree can we establish the validity of a signaling phenotype produced by an shRNA knockdown; that is, can we infer from a knockdown phenotype that the target protein plays a role in the signaling network responsible for the response? Second, does validated target X act up- or downstream of validated target Y? Finally, how will we apply RNAi knockdown technology to ask questions beyond the scope of the FXM in later years of this project?

i. Validating RNAi phenotypes for the FXM. It will be easier to model a signaling network if we can rely on the following simple assumption relating the shRNA knockdown to the signaling phenotype: that is, reduced signal flow through a network node whose abundance is reduced by the knockdown directly causes the observed signaling phenotype. Fig. 2 depicts questions and experiments relevant to this assumption. The first two questions are straight-forward. Abundance of the gene product targeted in a knockdown line is assessed by immunoblot or, when necessary, by RT-PCR. For each target, reproducibility of phenotypes is assessed by comparing ligand responses of two (or more if necessary) knockdown lines with those of control lines infected at the same time with lentivirus lacking the shRNA sequence. Experiments designed to answer the other three questions posed in Fig. 2 require more discussion.

Fig 2

First, what about off-target phenotypes that result from infection with the virus or expression of the RNAi, but are not causally related to the knockdown itself? Such effects, described by others (1-3), may be reproducible (e.g., when the RNAi reproducibly interferes with expression of an untargeted gene) and can certainly confound attempts to infer connectivity from the RNAi phenotype. To rule out off-target effects, we shall apply two experimental approaches to every knockdown line. First, in every case where a preliminary screen identifies two RNAi sequences that effectively reduce the target protein’s concentration, we will test both sequences in independently derived knockdown lines. (If the target is reduced by only one of the four RNAi sequences initially tested, that RNAi sequence will be tested in two independently derived lentiviral lines.) Second, we compare the pattern of mRNA expression in each knockdown line with its time-matched control line to ask whether the shRNA alters expression of a reproducible cluster of genes; in addition to identifying an off-target effect, genes found in such a cluster could point to a mechanism underlying the phenotype. Third, in every case where the appropriate drug or toxin is available we shall ask whether it reproduces or attenuates the apparent knockdown phenotype. For instance, pharmacologically inhibiting PIP3 synthesis (with PI3K inhibitors like LY-294002) should reproduce the altered response seen with a knockdown targeted against a PI3K isoform critical for PIP3 accumulation, while inhibiting PIP3 degradation (e.g., with a recently described inhibitor of the PIP3 phosphatase, PTEN [4, 5]) should reproduce knockdown of PTEN itself. Conversely, the PI3K inhibitor and the PTEN inhibitor could exert a precisely opposite attenuating effect on phenotypes of knockdowns directed, respectively, against PTEN and a critical PI3K isoform.

'Experiment' boxes enclosed by thick lines in Fig. 2 indicate experimental approaches to be applied to every knockdown cell line, as described above. The four experiment boxes enclosed by thin lines will be preferentially applied to knockdown phenotypes that are surprising or (if expected) essential to constructing and testing quantitative network models. These merit more detailed discussion.

Expression in a knockdown cell line of the target protein encoded by an RNAi-resistant sequence will allow us to ask whether replacing the target protein reverses the knockdown phenotype. For instance, expression of a human Gβ2 subunit reversed the phenotype produced by RNAi directed against the endogenous mouse Gβ2 sequence in a mouse macrophage cell line (6). Although this conceptually straightforward strategy will require expression of an additional construct in the knockdown line of interest, reversing the knockdown phenotype will assure us that the phenotype was caused by knockdown of the targeted protein — that is, the reversed phenotype was not due to an off-target effect of the shRNA. We would also infer that the knockdown phenotype was (probably) not due to mutation of a separate gene selected because it was required for survival in cells lacking the targeted knockdown.

An additional way to reduce the likelihood of off-target effects or selection of mutations as mechanisms underlying a knockdown phenotype will be to reproduce the phenotype by a different mechanism for knocking down the target protein — e.g., by transiently transfected siRNA (with a different sequence) or with antisense DNA. Unlikely to replicate off-target effects of an shRNA, these approaches are also less likely to allow selection of a mutation that becomes prevalent in the entire cell population, because they are necessarily applied for a shorter time, which allows fewer doublings of targeted cells. (The rapidly acting approaches carry a disadvantage, however: because only a subset of cells will be transfected with the siRNA, the phenotype must be tested microscopically in individual cells [e.g., with a Ca2+ fluorophore]).

Finally, we cannot always simply assume that a knockdown phenotype directly reflects decreased signal flow through the corresponding network node. Instead, knockdown phenotypes may reflect more complex indirect mechanisms, just as some mouse phenotypes reflect adaptation to a gene knockout over time (e.g., during development). The RAW 264.7 signaling network may subtly change its structure over time in response to knockdown of a target protein (e.g., by altering degradation or synthesis of cytoplasmic messengers, membrane lipids, or untargeted signaling proteins). Such an adaptive knockdown phenotype could obscure the important role of a target protein (by compensating for its loss) or produce an unexpected response that is not easy to rationalize on the basis of present knowledge. We shall try to rule out network adaptation in several ways. The simplest will be to replicate the knockdown phenotype (when possible) by acutely treating normal RAW 264.7 cells with a drug or toxin that specifically inhibits the protein targeted in the knockdown cell. (Note, however, that the drug and shRNA may on occasion produce quite different phenotypes even in the absence of adaptation to the shRNA — for instance, if the shRNA phenotype is caused by loss of a scaffolding function of the target, whereas the drug inhibits a catalytic function that is irrelevant to the phenotype.) Replication of the knockdown phenotype by applying antisense DNA or siRNA instead of lentiviral shRNA will also make adaptation a less likely mechanism, because these approaches allow less time for adaptation to take place. An experimentally more demanding approach, chromophore-assisted light inactivation (aka CALI [7]) allows nearly in-stantaneous inactivation of a protein by light-triggered local release of free radicals from a chromophore that binds to a sequence inserted into the target protein. We would infect cells with a lentivirus encoding both shRNA against the mouse sequence of protein X and a human X cDNA, into which the CALI sequence and a tag (GFP or epitope, which will ensure that the CALI protein is expressed) have been introduced. These cells should lack mouse X, express human X, and show a normal signaling phenotype, but applying CALI would rapidly inactivate the human X, allowing no time for adaptation.

In cases where adaptation cannot be ruled out as the mechanism underlying the knockdown phenotype for target X (knockdown-X), we can choose either to ignore the issue or to identify and exploit adaptation as an avenue toward understanding the signaling network. At the outset we will preferentially follow the first course, hoping that the gradually increasing number of additional knockdown lines and phenotypes will eventually reveal a pattern that explains the knockdown-X phenotype. For instance, finding that an effect we predicted from knockdown-X is instead produced by knocking down Y, a similar but distinct isoform, will strongly suggest that Y rather than X is rate-limiting for the signaling response tested. When simple explanations fail, we shall search for unexpected or postulated explanations. For instance, DNA arrays could reveal an unexpected over-expression of a second mRNA or cluster of mRNAs that compensate for loss of protein X. Alternatively, we may design experiments to test the idea that the knockdown-X phenotype depends upon a change in protein Y (e.g., enhanced degradation resulting from inability to bind to X).

The last example indicates a more general set of ways to understand any knockdown phenotype: that is, constructing and testing a plausible hypothesis. Such hypotheses and experiments will of course vary with the knockdown and will not be detailed here. It is worth pointing out, however, that AfCS labs are developing a useful tool for testing certain hypotheses: that is, the ability to knock down two target proteins in the same cell (see Research Plan of the Macrophage Biology Lab). This approach will allow us, for instance, to determine whether the partial signaling defect of a cell lacking protein X is made more complete by simultaneous knockdown of Y; if the double knockdown pro-duces a stronger phenotype, X and Y may perform the same signaling function (e.g., Gαq and Gα11) or play essential roles in two distinct pathways, each of which contributes a part of the response. (Double knockdowns would be con-siderably easier if we simultaneously transfected cells with two siRNAs; single cell assays will lend themselves to such an approach. We also hope to improve transfection protocols to permit high efficiency transfection of cell populations.)

ii. Epistasis and pathway mapping. Even the most definitive knockdown phenotype cannot precisely define the corresponding protein’s epistatic relation to other proteins in the network. If target proteins X and Y are both associated with a knockdown phenotype, we shall want to determine whether they act in a direct signaling pathway or interact through branches or loops, and whether one acts upstream of the other. To do so, we shall employ several strategies. Our initial models will for the most part rely on the simple assumption that X and Y relate to one another precisely as they are reported to do in other cells (e.g., Gαq and Ga11 perform the same function in parallel). In other cases, we can draw inferences about epistasis from effects of simultaneously expressing an shRNA against X with other constructs; examples might include dominant-interfering mutants of Y (negative or positive) or shRNA directed against Y (in double knockdowns, as described above).

Useful inferences about epistasis between X and Y will also come from measuring activities at intermediate signaling nodes in the network. Thus, if knockdowns of X and Y each reduce ligand-triggered phosphorylation of Z we shall suspect that X and Y both act upstream of Z. In contrast, if knockdown of X reduces ligand-triggered phosphorylation of Z (but not Q), while knockdown of Y reduces that of Q (but not Z), we shall suspect that X and Y act in distinct pathways. Intermediate or complex results will not be surprising, but combinations of these approaches can provide reasonably satisfying maps of epistasis relations among target proteins in FXM networks.

iii. 'Completing' the FXM. The FXM, even as narrowly defined here (Ca2+ and PIP3 responses to three ligands), should proceed well beyond completing shRNA knockdowns of the first 200 proteins on the FXM list. Measurements of Ca2+ accumulation in knockdown lines responding to FXM ligands already hint at unexpected effects of reducing G protein subunits on Ca2+ responses to IgG2a, suggesting that pathways downstream of trimeric G proteins interact at some level with those initiated by activation of Fcγ-R1. If confirmed, these findings will point to surprising and unsuspected connections in the RAW 264.7 network. Further FXM experiments will extend our understanding of network connectivity by assessing effects of: (a) simultaneous knockdowns of two or more target proteins in one cell line; or (b) knockdowns of additional potential contributors to the network, discovered by yeast two-hybrid analysis, immunoprecipitation, and other approaches in AfCS and other laboratories. Most important, we plan to devise, test, refine, and expand quantitative models of information flow through the Ca2+/PIP3 network. This will require applying assays designed to measure signal flow through key intermediate nodes of the network. Such assays will include phosphorylation of specific proteins (e.g., of Syk in response to IgG2a, or of C5a receptors in response to C5a); translocation of GFP-tagged proteins between cellular compartments (e.g., translocation of PI3K isoforms to the plasma membrane, in response to C5a or IgG2a); and FRET between network components that associate or dissociate in response to ligand (e.g., arrestins or GRKs interacting with the appropriate GPCR). For instance, we will want to dissect the roles of IP3 synthesis from later events involved in IP3-mediated Ca2+ accumulation. As described in the Microscopy Laboratory's research plan (Section XIII), cytosolic translocation of the XFP-tagged PH domain of PLCδ1 can serve as a probe for phospholipase C activation in individual living cells; measuring its translocation simultaneously with Ca2+ accumulation or a second translocation event (with a probe tagged with a different XFP) will provide the correlated time-dependent data that will prove especially useful in modeling. Each of these issues is discussed in greater detail in the Progress Report and Research Plans, below.

iv. Extending the FXM approach beyond its present narrow confines. Toward the end of the present funding period, the FXM approach will begin to extend into additional portions of the RAW 264.7 signaling network. In choosing additional target networks for extending the FXM, the Steering and Macrophage Committees have applied five criteria. First, the targeted network domain should produce responses that are biologically important. Second, like cytoplasmic Ca2+, the primary response to be measured should be quantitative, robustly reproducible, and easy to measure in moderately high throughput mode. Third, this response should be elicited by multiple ligands in RAW 264.7 cells, with pairwise combinations of different ligands producing non-additive effects. Fourth, the targeted network domain should be of manageable size and complexity. Fifth, signaling expertise in the relevant area should be easily identified within the AfCS. Recognizing the impossibility of choosing targets that perfectly meet all these criteria, we have chosen two candidate directions for extending the FXM: (a) responses downstream of a trimeric G protein, Gs; (b) release of cytokines. Both meet the requirement for availability of expertise within the AfCS superbly: for Gs and cAMP, AfCS aficionados abound (Bourne, Gilman, Ross, Sternweis, Simon, Taussig, and many others); for cytokines the same is true (Aderem, E. Brown, Seaman, and K. Smith). Gs/cAMP and cytokine release meet the other four criteria in different ways, as described below.

Before describing these two directions for extending the FXM, it is worth emphasizing that the approach to each will follow well-established steps. The first three of these steps are already well under way in our experiments with Ca2+ and PIP3. These include: (a) defining the key central assay(s) for measuring time- and concentration-dependent responses of control cells to an appropriate subset of RAW 264.7 ligands; (b) preparation of a literature-based signaling map and 'parts list' of key proteins probably involved in the network, followed by testing for expression of these proteins in RAW 264.7 cells; (c) to establish the initial connectivity map, we shall assess effects of shRNA-based knockdowns of key regulatory proteins. The first knockdowns to be tested will be those for which lentiviral shRNA-expressing cell lines were already prepared in the course of the present FXM effort; then we will proceed to new knockdown lines targeting relevant proteins not included in the original FXM list. (Note: producing knockdown lines at the present rate, AfCS labs can create ~1,000 different knockdowns during the next five years. Methods for applying genome-wide screens will also be available in the next few years; see Section XI.) Finally, we shall apply the data obtained in the first three of these steps to: (d) devising and applying assays for activities of key intermediate signals in each module (see Section c, below), and (e) constructing quantitative models of the responsible networks and refining them by experiment, in repeated cycles.

The Gs/cAMP module will focus on responses to two major ligands that elevate cAMP in RAW 264.7 cells, isoproterenol and PGE1, plus two interacting ligands, sphingosine-1-phosphate (S1P) and UDP, which the ligand screen showed to exert little or no cAMP-elevating effect on their own but to synergize with both isoproterenol and PGE1. We purposely confine this network target to responses within the first 20 min after addition of ligand and to network nodes upstream of cAMP accumulation and PKA (e.g., receptors, receptor kinases, arrestins, G protein sub-units, RGS proteins, adenylyl cyclases, phosphodiesterases, and PKA isoforms). This relatively restricted scope, paral-lel to that of the Ca2+/PIP3 FXM, will reduce the number of network proteins in the Gs module to ~150 and will make the experimental effort more manageable. Later, the Gs effort can expand to include kinases and cellular responses downstream of cAMP (e.g., inhibition of chemotaxis or alteration of cytokine secretion). The biological importance of the Gs/cAMP module reflects its ubiquity and versatility in regulating diverse functions of virtually all cells in the body, including cells of the innate immune system. Moreover, cAMP accumulation in all cells is exquisitely regulated — in size, time course, and subcellular distribution. We know many proteins likely to be involved in this regulation, and more are certain to be discovered. We do not, however, understand the signaling network(s) that connect(s) these proteins or the flow of information through them that accounts for precision of cAMP regulation.

As an FXM extension target, cAMP accumulation is especially attractive, for several reasons. (a) cAMP is easy to measure precisely and in relatively high throughput, as already shown in AfCS ligand screens. (b) These screens have also revealed heterogeneity in the responses to isoproterenol and PGE1 (that is, different time courses, quantitatively different maximal cAMP accumulation, and non-identical sets of interactions with other ligands). (c) Most important, the two synergizing ligands may act by different mechanisms: S1P appears not to act via an effect on Ca2+ accumulation, whereas we suspect that the UDP effect is mediated by Ca2+.

The Gs/cAMP module serves as a 'horizontal' extension of the present FXM, by increasing the numbers of GPCRs studied to at least six (C5a receptor, two P2Y receptors, the β2-adrenergic receptor, plus one or more prostaglandin [EP] and Edg receptors) and the number of trimeric G protein families to three (Gi, Gq, and now Gs). For now we have decided against formal horizontal extension to the remaining family of trimeric G proteins, G12/13. This is because we have not unequivocally identified the requisite robust signal downstream of a receptor that activates G12 and G13 in RAW 264.7 cells. We will, however, test the effects of Gα12 and Gα13 knockdowns on all the responses and ligand interactions, and may find ourselves studying one or both of these G proteins as well. In fact, preliminary observations suggest that Gα13 may indeed play a critical role in the synergistic effect of S1P on isoproterenol-stimulated cAMP synthesis.

Choice of the cytokine release module reflects both the well known biological importance of macrophage cytokines in host defense and discoveries by the AfCS two-ligand screen that cytokine secretion represents a set of robust and easily measured responses with fascinating interactions among pairs of AfCS ligands: that is, marked inhibition or potentiation of the response to one ligand by addition of a second. The cytokine and Gs/cAMP extensions of the FXM differ in several ways. The latter module extends the FXM horizontally (to an additional set of GPCRs and G proteins), while cytokine release extends the FXM in a 'vertical' dimension: cytokine responses involve ligands with receptors and downstream signals different from one another and from those in the Gs/cAMP module. In addition: cytokines are released at a relatively slow rate, so that output measurements are usually performed 2 or more hours after application of ligands; using cytokine release as an output would expand the FXM's scope to include extremely complex cell functions and signaling nodes in the network downstream of the receptors (e.g., synthesis of specific RNA messages and/or proteins, proteolytic cleavage of precursor proteins, and secretion). These differences make cytokine release attractive as a target that will markedly extend the scope of the Alliance effort. At the same time these differences present considerable complexity and many experimental difficulties, requiring that we carefully define scope and design of the experimental approaches.

Consequently, we shall confine cytokine release experiments to a relatively narrow scope, studying (at least at the outset) a few pairs of ligands that reveal reproducible inhibitory or potentiating interactions in regulating cytokine release. For instance, two cAMP-elevating ligands, isoproterenol and PGE1, strikingly inhibit or potentiate effects of Toll-like receptor ligands, depending on which cytokine’s release is measured. Thus, the cAMP-elevating ligands inhibit LPS-stimulated release of Rantes, MIP1α, and TNF, but potentiate LPS-stimulated release of GCSF (Section II.A.1). Focusing initially on these interactions would provide a nice opportunity to leverage information obtained in experiments directed at the Gs/cAMP module.

Just as important, with each pair of receptor ligands a primary goal will be to find reproducible assays for relevant signals that change more rapidly (not later than 20 min after ligand addition) than do release of the relevant cytokines. The parts list of proteins thought to mediate responses to the ligands will furnish candidate 'rapidly responding' nodes upstream of cytokine release (e.g., activation of a kinase or phosphorylation of a key intermediate). Several protein phosphorylation events have already been detected in RAW cells following application of ligands for Toll-like receptors, and a few substrates for PKA have also been detected. The most relevant upstream nodes will show the following characteristics: (a) location in the signaling pathway near the receptor(s) for the interacting ligands; (b) probable importance in mediating the cytokine release response to the ligand, based on information from the literature; and (c) amenability to perturbation by drugs or by shRNA knockdown of protein abundance. Any protein will constitute a candidate signaling node for interaction of the two pathways if knockdown or pharmacological inhibition of that node affects both the cAMP and LPS responses or – still better – blocks or enhances the inhibition of LPS responses by cAMP-elevating agents. We would consequently focus our experimental attention on signaling events upstream of such a signaling node, but continue to use the cytokine assay as a readout for perturbation of these upstream events.

It will not be easy to apportion resources among extensions of the FXM. From the armchair perspective, cAMP responses and cytokine release appear to possess different potential assets and liabilities. In the real world of the laboratory, the results of experiments will of course guide us.

Finally, during the next 5 years it may prove possible to extend the FXM approach even further — that is, to domains of the RAW 264.7 signaling network beyond those that regulate Ca2+, PIP3, cAMP, and cytokine release. Candidates may include: (a) the numerous ligand interactions already found in the double-ligand screen, some of which were not previously known (see Progress Report, Section II.A); (b) rapidly accumulating data on ligand-induced changes in lipid composition of RAW 264.7 cells (see Progress Report and Research Plan of the Lipidomics Lab); (c) other candidate responses whose robustness is currently being tested in AfCS labs (e.g., cell motility, macropinocytosis, and phagocytosis).

c. Quantification of Proteins. We made brief mention of the need to quantify signaling proteins of interest in our original proposal, but we have not yet initiated this effort because of other priorities. Rapid changes in technology now make this challenge more tractable. Our initial proposal involved quantitative immunoblotting coupled with strategies for producing tagged standards for absolute quantification. We will now rely predominantly on mass spectrometry.

The need for these measurements derives both from the desire to know relative abundances of interacting species and their absolute amounts or concentrations. The situation with G protein heterotrimers serves as an example. G protein α subunits are degraded rapidly in the absence of the stabilizing Gβγ complex. If three isoforms of Gβ are expressed in RAW cells at a relative abundance of 10:1:1, knock-down of the abundant species will likely have a profound secondary effect on the concentrations of several Gα and Gγ subunits, depending on their relative abundance. Removal of less abundant Gβ species will likely have major effects on the concentrations of other subunits only if their interactions are specific and mandatory. Similarly, if a complex between X and Y is critical for cell function and the relative abundance of X and Y is 1:10, removal of 90% of Y may not greatly impair formation of the XY complex.

For quantitative analysis of information flow, the need to know precise and accurate local concentrations of individual proteins depends on whether they act predominantly catalytically, whether they act stoichiometrically as allosteric regulators or scaffolds, or whether their activities depend on such stoichiometric interactions. In the first case, net local activity, determined in arbitrary units, may be adequate for quantitative evaluation and modeling of the network. In the latter two cases, standard modeling approaches require concentrations. In these cases, high-affinity interactions among components of similar total concentrations can lead to complex, non-linear behaviors where small changes in relative concentration can change outputs dramatically. We will concentrate our efforts on identifying the most important of these difficult cases, and we will also target development of intracellular sensors to provide functional measures of the outputs of these concentration-dependent activities.

Because the AfCS will predominantly be concerned with measurements made over relatively short times, changes in the concentration of most intracellular proteins will not be an issue (other than changes of individual molecular species caused by covalent modification). Nevertheless, changes in protein concentrations that might result from processes such as endocytosis or regulated proteolysis can be monitored. The Absolute Quantitation of peptides (AQUA peptide) strategy is best suited to our needs (see Section XII). This approach uses heavy isotope-containing reference peptides added to cell lysates to provide absolute standards for mass spectrometric measurement of individual proteins. Analysis can be performed with or without prior fractionation, as appropriate, and the same strategy can be used to quantify covalent modification of specific residues. Specific plans are discussed in more detail in Section XII. Alternative approaches involving immunoblotting are described in Section XIV.B.1

d. Localization of Proteins. The task of assessing the cellular location of signaling proteins will be addressed by dual fluorescence microscopy (Section XIII). Briefly, yellow fluorescent protein- (eYFP constructs) tagged proteins will be ectopically expressed in RAW cells and their location determined by correlation with co-expressed cyan fluorescent protein- (CFP) tagged subcellular marker proteins (8). Expression of two complementary fusion constructs that place the fluorescent tags on either the N- or C- terminus of the target protein will be employed. In some cases, failure to reach a consensus localization for a protein using these two constructs may be due to possible mislocalization because of addition of the fluorescent tags – for example by misplacement of amino- or carboxyl-terminal signals for covalent modifications that direct membrane localization. Where possible, information derived from legacy data should aid in resolving differences in N- and C-tagged fusions. Attempt to visualize the native protein may also be employed if appropriate antibodies are available. We believe that mislocalization due to excessive over-expression of these constructs will not be a common problem, since RAW cells have proven to be resistant to such over expression. In the event that we are unable to resolve discrepancies in the localization patterns of the two tagged fusions, it may be necessary to carry both possible cellular locations through the modeling efforts.

In addition to the static localization of signaling proteins in cells under resting (unstimulated) conditions, the redistribution of these proteins following ligand treatment underlies important spatial and temporal dynamics in the propagation of signals through the network. Ligand-induced redistribution of Akt, PLC, BTK and other proteins has been described (9-12). These methodologies have been adapted by the AfCS to assess the C5a ligand-induced redistribution of PH domains isolated from 136 proteins, and we have demonstrated the ligand-induced translocation of 9 known and 1 novel PH domains (see Sections II.H.C and XIII.B.1). Similar fluorescence microscopy approaches that rely on the ability to track the real-time movement of YFP-tagged proteins and their correlated localization with co-expressed CFP-tagged cellular markers will be employed for the FXM signaling proteins.

These translocation assays will be used in two basic experimental settings. First, this technology will be adapted to provide a medium-throughput platform with which to develop additional translocation biosensors. These biosensors will be a necessary addition to the repertoire of cellular probes needed to quantify the information flow through signaling networks following stimulation by ligands. For this approach, constructs encoding GFP-fusion proteins (derived from a subset of the hundreds of full-length cDNAs produced by the AfCS during the current funding period) will be screened for possible translocation in response to a cocktail of ligands. Second, the approaches will permit quantitative analysis of the time course of translocation, as well as the effects of perturbations at distinct nodes of the signaling pathway on the movement of tagged proteins.

2. Quantitative Determination of Information Flow

The ultimate goal of the AfCS is to understand quantitatively how information flows through the RAW 264.7 cell signaling network in response to individual and combined inputs, and how this flow leads to cellular behaviors. This challenge demands experimental and computational innovations.

The complexity of cellular signaling networks is characterized by simultaneous use of multiple distinct biochemical mechanisms, related members of multi-gene families, and splice products of single genes. This apparent redundancy permits (1) partitioning of signaling events to distinct subcellular localization, (2) distinct controls for inputs that lead to the same endpoint, (3) controlled additivity or non-additivity of parallel or similar pathways, (4) variably non-linear response functions, (5) robustness to disruption by untoward inputs, (6) distinct patterns of response kinetics and/or amplitude and (7) the ability to developmentally or adaptively alter individual ligand-response relationships differentially.

Such beautifully engineered architecture frustrates most contemporary approaches to systems-level analysis of cellular signaling. The AfCS plans to meet this challenge head-on. After mapping the RAW cell's signaling network, including redundant components and pathways (above), we propose to determine quantitatively which signaling functions within the network are called in response to what inputs. To do so, we will measure activity at as many key nodes as possible, in real time, continuously or nearly so, and with minimal unintended perturbation. We expect that distinct inputs will channel information through distinct and/or overlapping paths. Understanding both the patterns of information flow and their determinants will be the major AfCS achievement over the next funding period.

a. Probes and Assays Currently Available. We will expand the use of high- and medium-throughput, quantitative cellular probes of signaling activity as our knowledge of the signaling network develops. Most useful probes fall into four general classes. Details are provided in Sections IX and XIII.

i. Fluorescent probes of concentration or activity. In addition to Fluo3, a synthetic fluorescent Ca2+ indicator already in use, we will apply several sensors of protein phosphorylation, activation of monomeric GTP-binding proteins, and other signaling intermediates. Each allows real-time monitoring of the activation or concentration of specific cellular signaling components with excellent temporal resolution and, when monitored microscopically, spatial resolution as well. Protein kinase sensors are good examples (13-15). They consist of a CFP FRET donor and YFP acceptor, a linker peptide that contains a phosphorylation site (selective for one or a few kinases), and an adjacent phosphoprotein-binding domain (PPBD) (see Figure, next page). Fig 3 Phosphorylation of the linker causes intramolecular binding of the PPBD to the phosphoamino acid. A tether can be added to drive organelle-specific localization. Such sensors monitor the activity of kinase-phosphatase pairs according to a change in FRET upon phosphorylation of the linker peptide. Selectivity among kinases depends on the design of the linker sequence; selectivity for phosphatases is less well understood, but adequate phosphatase activity is needed for reporting kinase action. Selectivity and affinity of the PPBD are lesser issues because binding is intramolecular. Similar probes for the activation of monomeric GTP-binding proteins are also available (16, 17). They separate the CFP and YFP moieties with a selective G protein binding domain (GBD) and a second domain that binds (or releases) the GBD when it has bound activated G protein. Targetable FRET sensors have also been developed for Ca2+ (18, 19) and cyclic nucleotides (20, 21). Intramolecular FRET biosensors can be used microscopically at the single cell level or, after stable expression, in multi-cell microwell formats. Dual infection with lentivirus vectors or the use of IRES-containing vectors permits use of sensors with either shRNA constructs or mutant signaling proteins. Both procedures are currently used in AfCS laboratories.

ii. FRET and BRET probes of protein-protein association. Regulated protein-protein binding will be assayed by intermolecular FRET or BRET (bioluminescence resonance energy transfer; see Section VII). Many such probes are available, and construction of new probes is not difficult, although their calibration and validation will be a substantial undertaking. For FRET, the most common donor-acceptor pair is CFP and YFP, each fused to one of the binding partners and modified to optimize optical properties and prevent intrinsic tendency to dimerize. BRET probes, where the donor is Renilla luciferase and the acceptor is a GFP, are constructed similarly and are best suited to donors present at low concentrations (22, 23). In cases where a GFP or luciferase moiety interferes with binding, we will use tetra-cysteine peptides designed to react with FLASH or ReASH, which are selectively reactive fluorophores that are added as protected, cell-permeant reagents after the target protein has been expressed (24). Intermolecular FRET/BRET signals are proportional to fractional formation of the two-protein complex, although existence of one or both free molecules in excess requires sophisticated calibration.

iii. Fluorescent probes of cellular translocation (see Section XIII). Many proteins (or their isolated interaction domains) can be fused with GFP or a FLASH-reactive tetracysteine peptide to monitor their intracellular localization or movement. Movement may reflect multiple sorts of regulation, including relocalization of the binding partner. An example is the PH domain of Akt fused to YFP, which we now use to monitor PIP3 formation in the plasma membrane (Section II.H.). Other binding domains used in this way include PTB, SH2, and C1 domains; targets include phospho-proteins, lipids, GTP binding proteins, etc. Movement is detected microscopically. In addition to real-time microscopy, informative proteins for which fluorescent constructs are not available can be localized and measured by immunocyto-chemistry after fixation.

iv. Biochemical quench-and-assay probes. Many traditional biochemical assays combine information value with temporal acuity, sensitivity, and throughput such that they provide excellent probes of real-time signal flow. We will continue to measure phosphorylation of key proteins by both mass spectrometry and immunoblotting. Sample quenching is rapid and can be automated, assays are established, and throughput is good. New mass spectrometric methods will further increase throughput, provide information simultaneously on multiple phosphorylated residues, and improve quantitation (see Section XII). Several phosphoproteins can likely be measured in a single experiment, often with absolute quantitation, and peptides from control phosphoproteins normalize for experimental variation. The appearance of phosphorylated sites can be directly linked to the regulated activities of individual protein kinases. While absolute quantitation of protein kinase activity requires knowledge of protein phosphatase activity, rates of change of phosphorylation provide important information on network dynamics.

Parallel measurement of cellular lipids provides similar quantitative information on multiple reactions with excellent time resolution (Section XV). While sample throughput is relatively low, lipid assays are multiplexed and information output is enormous. Because both anabolic and catabolic reactions are known, changes in lipid levels give direct measurements of the enzymes involved. As standards become available, absolute quantitation also becomes possible. Further, since discrete organellar pools of phospholipids may be distinguished by their side chain compositions, the subcellular locations of individual reactions can also be inferred from side chain profiles.

Measurement of total cyclic AMP also remains a good example because it maps to regulation of adenylyl cyclases and upstream regulators and multiple phosphodiesterases that have distinct sensitivities to pharmacological inhibitors and regulators.

b. Standardizing, Normalizing, and Interpreting Data. Intracellular fluorescence probes provide exquisite temporal resolution and acuity, but at the expense of absolute biochemical quantitation. Absolute concentrations can sometimes be referenced to a few microinjected samples when needed, but fluorescence data are routinely recorded as fractional responses. Intramolecular FRET can be standardized further according to wavelength ratios. The donor:acceptor molar ratio is always unity and fluorescence data are directly related to fractional effect. Intermolecular FRET signals are more difficult to standardize, but dual wavelength measurements can correct for spectral spillover unless donor:acceptor ratios are far from unity. Biochemical analyses are expressed in molar terms. After initial normalization to allow comparison among experiments, responses are analyzed as log-fractions with weighting for experimental error, essentially the method developed for ligand screen data (Section II.A.1). Fortunately, many of the most important data will convey rates of change, which can be expressed as first-order rate constants.

Essentially all probes measure a quantity of material rather than an activity per se (i.e., amount of phosphoprotein, not the activities of the relevant kinases and phosphatases). The accumulation of a phosphoprotein, a multi-protein complex, or an activated G protein does provide information on the rates of the formative and degradative reactions. Inhibitors, dominant mutations, and RNAi knockdowns will be used to isolate individual reactions kinetically or, more readily, to shift the balance of equilibria or cycles. Many signaling intermediates have known activities, such that their concentrations can provide inferred activities. Design and interpretation of these experiments will depend heavily on interactions between quantitative modeling and investigators' experience to extract the maximal amount of information from the data.

A caution to the use and interpretation of cellular sensors is their ability to perturb the very reactions that they are meant to measure. A high-affinity sensor must be expressed at a level significantly below that of its binding site or below that at which its catalytic activity alters signaling. The sensitivity of fluorescence measurements generally allows sensors to be expressed at such low levels. Standard controls for perturbation by sensors include monitoring both nearby signaling reactions and cellular responses to ligands with and without the sensor. Perversely, while high-affinity sensors are intuitively attractive, high affinity plus low-level expression can limit dynamic range, and low-affinity probes are often preferable. They do not perturb equilibria because they sample only a small fraction of the target protein determined by their fractional binding. However, they must produce extremely low signals while in their "off" states because only a small fraction is "on". These conditions are not often met for FRET-based sensors, but BRET-based sensors, single-fluorophore sensors, and relocalization sensors fulfill these criteria. We will routinely control expression levels using inducible promoters already tested in the Molecular Biology Laboratory, and we will modulate affinities as necessary with surface mutations. We will codify and standardize these considerations to speed probe development. In addition to providing constraining parameters for computational modeling of signaling modules and networks, the sensors will provide both qualitative and quantitative data that are intuitively accessible to biologists. These data will be displayed in searchable form to allow others to evaluate interesting individual reactions or groups of reactions.

c. Development of Novel Sensors. While the number of available intracellular sensors is growing rapidly, we will surely need sensors that are not yet available. It is obvious, however, that we cannot construct and validate sensors for even a major fraction of the reactions in the RAW 264.7 cell signaling network. We will use a continual cycle of experiment, modeling, and discussion to nominate candidate nodes in the signaling network for monitoring with novel sensors.

i. Placing the sensors. The AfCS laboratories have the capacity to produce more biosensors than can be validated, and we must make choices. At least five approaches will be pursued in parallel and will be monitored closely for relative success rates. Advances in modeling and network theory should further streamline the process. 1) We will draw on the insight of the Macrophage Committee and associated scientists. Legacy knowledge of pathways and useful experiments will be a valuable guide. 2) Modeling small modules (Ca2+ component of the FXM, for example) often supports some sort of sensitivity analysis, in which the dependence of outputs can be assigned to a few major reactions. Even though small modules can become too complex for classical sensitivity analysis, Monte Carlo searches of parameter space and analysis of surrogate polynomial models can often pinpoint key reactions (see Section XVI). 3) Within a module in which multiple small "black boxes" can be identified, sequential examination of all possible connections within a box can often eliminate a large fraction of the possibilities and thus allow us to focus on the most likely important reactions. 4) The response surface techniques described by the DMNA core (Section XVI) can efficiently determine how a feature of model behavior depends on parameters even for relatively large models. 5) In some cases where stoichiometry is known and where branching is limited, metabolic control theory can indicate appropriate monitoring points. This approach may be particularly useful for lipid metabolism. 6) Further, the formal approaches described in Section XVI provide model-based predictions of which measurements are likely to give the best estimates of model parameters or to distinguish among competing models for a process. We will combine these strategies with consideration of technical feasibility to prioritize our choices.

ii. Sensor development. Based on criteria of applicability and need, we will develop new biosensors to help probe the RAW cell signaling network. We will initially focus on relocalization sensors in the Microscopy Laboratory and on FRET- and BRET-based phosphorylation sensors in the Cell Laboratory (Sections IX and XIII). We will focus our efforts on standardizing the structures of the substrate/linker domains to allow substitution of kinase- and/or phosphatase-recognition sites without needing to re-engineer the linker for each construct. We will also focus on the choice of relatively promiscuous PPBDs to allow us to use only a few PPBDs in combination with multiple phosphorylation sites. We will also replace FRET with BRET in cases where low-level expression or variable donor-acceptor stoichiometry make FRET hard to interpret. We expect that both the novel probes and general protocols for standardization will be a valuable contribution to the entire signaling community.

3. Data Management, Analysis, and Bioinformatics

The AfCS will generate the most comprehensive set of data available to date on a single type of mammalian cell. This affords the research community a unique opportunity to perturb, analyze, reconstruct, and model the cell and its parts at a systemic level. During the first five years of the project the Data Management, Analysis, and Bioinformatics (DMAB) Laboratory created a unique bioinformatics infrastructure that, using the web as the front end to sophisticated databases, provides data in a user-friendly manner to the biological research community. In the second five years we will continue to enhance this data management infrastructure and create a software environment for data interoperability and integrative analysis. Our broad objectives include the following.

These objectives will be achieved as a result of strong communication with all other AfCS laboratories.

a. Data Management. i. Data acquisition and entry. New assay development is a high priority for the AfCS Laboratories. The DMAB Laboratory will continue to maintain the Laboratory Information Management System (LIMS) and develop new bar code schema and GUIs for new assays, always interacting directly with the relevant experimental laboratories.

ii. Data storage. We currently store AfCS experimental data in Oracle relational databases in the central AfCS bioinformatics servers at the San Diego Supercomputer Center. For new types of experiments we will create new schema, Oracle databases, and scripts for loading and storing data. We will continue to use best-practices programming styles that will be reflected in modular, scalable, and extensible software engineering. We will also provide the documentation required for replication of our entire data system at other sites.

iii. Display and dissemination of data. We will continue to provide immediate and unrestricted access to validated AfCS experimental data through the AfCS-Nature Signaling Gateway. The raw data from most experiments are presented in annotated tab-delimited files for point-and-click downloading. In addition, it is our mandate to present the data in formats that are easily accessible to biologists. Time-series data will be presented as plots along with basic statistics. We will display interaction maps through "webdot" diagrams, as well as downloadable images. For FXM experiments, including knockdowns of specific gene products in the FXM pathways, we will employ PathwayBuilder in collaboration with the Data Modeling and Network Analysis (DMNA) Lab (Berkeley) (see Section XVI).

iv. Curation and statistical analysis. The DMAB Laboratory will provide both automated statistical analysis of data and a core set of standard statistics tools for the others to use, as desired. Our automated tools will calculate the mean and variance statistics and display correlation and heat maps (from clustering) to give a first glimpse of data similarity and variation. Given the multiple dependent variables in our experimental measurements, simple t-tests will not be adequate and we will perform ANOVA (analysis of variations) to assess the significance of variables systematically. Beyond this, we will employ two approaches for data analysis. In the first unsupervised learning approach, we will use both clustering and classification methods. In the second, in collaboration with the DMNA Lab, we will validate AfCS data in the context of specific network models.

b. Enhancements to the Molecule Page Database. The Molecule Page database (MPDB) is a web-based resource for creation, curation, and dissemination of a large body of structured knowledge about signaling molecules. To date, we have created a very detailed ontology (on which the MPDB is based), extensive applications for author entry and peer review, and user interfaces for display of and querying the MPDB. We anticipate dynamic growth of the MPDB, and the hope is that the authors will update each Molecule Page annually. This requires continued development and maintenance of the database. We will also add features to the MPDB. These are described in Section VIII.C.5, and will include an edit tracking system, new data entry interfaces, sequence-structure visualization and editing tools, reaction schemes, further development of the user query interface, and integration of additional databases in the public domain.

c. Database Interoperability and Data Integration. A major challenge for the DMAB Laboratory is the ability to provide seamless interoperability between different AfCS data types and tools for integration of data. The AfCS data will only be useful if researchers can formulate queries across databases and have access to tools for integrating many types of data. We describe below our approach to database interoperability and data integration.

i. The parts-list database and interoperability system. Conceptually, AfCS data have several dimensions: perturbation inputs, outputs from many experimental assays, and context-specific parts lists of proteins and small molecules (e.g., lipids). Interoperability between data spanning these different dimensions is essential for interrogating and querying these data sets. The response to queries that relate these different data dimensions yields biological knowledge. To achieve interoperability in this multidimensional, multi-database environment, we will use the mouse proteome parts list as a reference point. Single molecule data, data involving pairs of interacting proteins, and data involving networks of interactions will be referenced through components (defined states of proteins) in the parts list database.

Our ultimate goal is to infer signaling pathways and networks from these data. Our core database (the Parts List database), which will have pointers to the AfCS databases, will be the mouse proteome (derived from the mouse genome). Input dimensions such as ligands will be related through their native receptors; output dimensions such as enzyme substrates will be related through proteins in associated pathways. We will develop a large application layer that will communicate with the relevant AfCS databases and carry out given relational computations. A part of this application layer will involve parsed SQL queries, while others will involve procedural computations across data from different tables. For instances that require repeated and often sought relationships (e.g., canned queries), we will create materialized views that will enable rapid access to the response. In other words, the Parts List database will serve as the core elements for relating different types of data and signaling networks. The interoperability system will contain the business layer application scripts that will parse queries and compute data relationships across tables in distinct databases.

ii. The query system. Our system will accommodate three categories of queries. In the first, users will be able to query individual types of data, such as ligand screen phosphorylation patterns or yeast two-hybrid interactions. The query interfaces will have customized standard queries, and the results will be displayed in user-friendly formats. The second component of our system will allow a user to pose queries across a selected set of databases. For example, it would be possible to ask: what are the interaction partners of a protein in the FXM network that would be a target for a knock down experiment or what are the known substrates for different isoforms of protein kinase C? In the third, we will provide a multi-tiered interface that will allow a user to create complex queries through Boolean combinations of different types of data. We anticipate this to enable "what if" types of questions, and the responses will provide valuable biological knowledge. Our query system will be built on the data integration strategy described below.

The integration system will contain a client application layer, a mediator, an agent layer, and a display API. The client application layer will house the query input GUIs and the query result display interfaces. The mediator will provide an integrated virtual view of all the source databases, store meta-data that will help look up service agents, and host a compute engine that will parse the query into elemental queries that can be mapped on to individual agents. The mediator will also select an optimized execution plan. The agents are launched from the task manager component of the mediator that oversees agent deployment and action. The agent layer will be further subdivided into data agents and analysis agents. The former will be a traditional wrapper that will communicate with the source database and retrieve data. An analysis agent is a wrapper around analysis tools and can act on databases, parsed results, and output modalities. To make our display interfaces versatile, we will develop display application programmer interfaces that can be used in a modular fashion to construct display interfaces specific to customized data displays. When a user interface issues a query, the mediator will use the meta-data to determine which data sources hold the information. It will then parse the query into sub-queries, launch the appropriate agent(s), and handle the display of the query result using modules from the API. We describe this approach in detail in Section VIII.C.6.

d. Linking AfCS Data to Modeling Efforts: The Signaling Database and the Meta Data Catalog. The AfCS will devise testable qualitative and quantitative models of signaling networks. Towards this end, the DMNA Laboratory (Berkeley) will develop a biochemical pathway modeling strategy (discussed in the following section) based on PathwayBuilder, which has been developed as a component of the BioSpice project. It allows us to create, edit, and model biochemical networks based upon a combination of AfCS experimental data and legacy knowledge. It is thus essential that the AfCS databases communicate effectively with PathwayBuilder and other pathway GUIs and tools.

The Molecule Page database provides a structured representation of molecules, their post-translational modification, and complexation states; the chemical transformations they are involved in; and even kinetic and thermodynamic information about these transformations. The database is populated with high-quality data curated by AfCS members and with imported information from external databases such as BIND. These data can be semantically mapped to the representation of pathways and pathway models in PathwayBuilder. PathwayBuilder will be able to import this information from the Molecule Page database using SQL queries through the PathwayBuilder database interface. The Molecule Page database thus will provide a substantive starting point for the modeling efforts of the DMAB. Results of the DMAB in parameter estimation and model structure will also be evaluated for importation into the Molecule Page database.

AfCS data are extremely heterogeneous, and developing suitable export formats is a challenging problem. The export tools will make use of existing data structures and languages where possible. Whereas plain text formats are most convenient for experimental data, Molecule Page data will be exported as XML. Many research groups have already developed XML vocabularies that will be evaluated for inclusion within our export formats. To maximize the usefulness of the export, a Simple Object Access Protocol (SOAP) version of the query interface will be made available and described in a web services description language (WSDL) document. For collated data across AfCS databases, we will create data views through meta-data catalogs and develop XML/SBML models that will allow connecting data and annotations to nodes and edges in pathway models.

To facilitate storage and dissemination of AfCS reconstructed biochemical networks, we will develop a Signaling Database that will contain the pathway models (along with an SBML exchange format), the components of the pathway, pair-wise interactions in terms of the biochemical process that links the pair, phenotypic properties of the entire network and sub-networks, and related, comparative models when available. The Signaling Database will be in an Oracle relational database format and will be interoperable with other AfCS databases. We describe this in more detail in the Section VIII.

4. Data Modeling and Network Analysis

a. Introduction. The newly created Data Modeling and Network Analysis (DMNA) Laboratory seeks to build model-based tools to facilitate extraction and testing of hypotheses about how signals propagate in the large-scale signaling networks of interest to the AfCS. There are two main subtasks of the laboratory: 1) the data modeling effort is designed to work with the DMAB Lab to extract key features of the data that should be predicted from network models and to infer new interactions among network components from the data and the literature; and 2) the network analysis effort is designed to deeply curate legacy information about the signal transduction pathways into computable models of network dynamics that can be directly compared to the AfCS data. The laboratory will then provide innovative methods to quantify the consistency of the experimental observations with the network model and for ranking models of alternative hypotheses. The results of these analyses will be to enumerate the parts of the model that are either unconstrained by the data (meaning there is not enough data to know if the model may be correct in that area) and which parts are inconsistent with the data. The laboratory will develop formal and informal methods for optimal experimental design to discriminate between alternative network hypotheses and to clear up inconsistencies or critical under-constraint in the model.

This effort is critical to understanding signal flows in these networks. We argue that the number of network constituents and their interactions and the complexity of the data being collected is such that reasoning about these networks without model-based computational tools will be difficult at best. Below, we outline the approach of the DMNA Laboratory to creating the tools that will ease this process.

b. Approach. The aims of this core are to integrate the information about the molecular pathways and information flow in the signaling networks into a form that may be rigorously analyzed to quantify consistency of the model assertions with other experimental observations. This entails a number of tasks: 1) extracting from the literature and a subset of AfCS data the relevant signal transduction network structure, that is, the molecular players and their interactions; 2) qualitatively annotating this pathway with the evidence for the existence of each species and its interaction with other species; 3) quantitatively annotating the pathways with relative or absolute molecular abundance ranges and assignment of biochemical mechanism and parameter ranges to each of the interaction processes; 4) analyzing the model both by simulation and by parametric dependency analysis (sensitivity, bifurcation, etc.) to define the range of behaviors that are physiologically possible; 5) formally comparing model predictions to data to discover inconsistencies either in the model assertions or the data, to discriminate and rank alternative model hypotheses, and to design experiments to resolve model hypotheses, ambiguities, and inconsistencies.

That is, we wish to explain, using the causal assertions in the mathematical signal transduction model, the temporal changes in chemical activity as a function of input of different concentrations of ligand under different RNAi knockdown conditions. If the model does not admit explanation or admits too many explanations, we want to be able to design experiments effective at discriminating among alternative hypotheses or that best constrain the existing model. It is this last task on which the most research needs to be accomplished; however, it is exceedingly difficult to accomplish without the strong infrastructure provided by the previous tasks.

Here we briefly describe our approaches to each of the tasks above. These will be further detailed in Section XVI.

c. Activity 1: Data Analysis and Statistical Modeling. Data collected for the model analysis described below needs to be collected carefully and then processed in a number of ways specific to some of our proposed methods. Time series of molecular abundances (relative or otherwise) are the most common data type collected by the Alliance. These time-series are constructed from immunoblots detecting protein phosphorylation, fluorescence measurements of Ca2+ concentration, gene expression measurements, secretion of cytokines, etc. A new time-series is collected for each perturbation by receptor ligands and may be derived from a wild-type cell or one treated with one or more specific RNAis or other perturbants. For the model/data comparison tools we are developing, it is sometimes useful to reduce these time series to a collection of features, such as peak Ca2+ concentration, peak width, time to peak concentration, etc. Our goal, then is to use some of these data to parameterize models (determine values for their kinetic parameters) and other of the data to show where a model is inadequate. The methods that will be used in both cases require two things: first accurate estimation of the relative changes of either a species abundance/activity in time or the value of a "feature" under different conditions; second, accurate estimation of the uncertainty (e.g. variance) in repeated measures of this series. Physiologically, and for comparison to models, it is important to differentiate between mean and variable cell behavior; there is variability in the Ca2+ response measured from a population of cells, in the response of a single cell, and across different single cells. In collaboration with the DMAB Lab, we will create statistically robust estimators for each of the mean responses and their variability in each of these cases. Some of the basic methods for using these features are discussed in Section XVI.B.

The second main effort is to derive different model hypotheses directly from data and to add model hypotheses to the highly curated pathway models derived in the effort described above. The AfCS yeast two-hybrid experiments are one direct source for the addition of putative interactions to the curated models. The second source of model hy-potheses comes from what is essentially correlative analysis in which the responses of different species in the network are cross-correlated and a measure of "interaction" among these species is made. While the DMAB Lab will make most of these statistical efforts, the DMNA Lab cites their development and describes application of such methods in Section XVI.

d. Activity 2: Qualitative and Quantitative Annotation of Signaling Pathways. The representation of a molecular pathway is still a matter of some art and choice. The level of detail at which to represent each possible state of every molecule and the processes that bring about their interconversions is still a matter of debate. However, with the participation of the signal transduction biologists in the AfCS, we have an unprecedented opportunity to create a comprehensive, quality-controlled, and quantitatively annotated representation of signal transduction networks in immune cells. To aid these biologists in the principled construction and annotation of these pathways such that they may be fruitfully analyzed, we have introduced a tool for formal capture of knowledge about pathway structure, mechanism, and dynamics called PathwayBuilder. In brief, the tools are beginning to allow a group of remotely located, collaborating scientists to simultaneously edit a pathway at whatever level of detail seems appropriate at the moment. Every object (molecule, complex, cellular superstructure) in the pathway may be annotated with external web and literature references, comments, relative or absolute initial concentrations, etc. Every process (chemical reaction, transport process, mechanical shape change) in the pathway can also be annotated with references and comments. In addition, each process can be assigned a type from a user-extensible ontology of process types, a mathematical model if one is available for that process type, and relative or absolute ranges for the parameter of each model. PathwayBuilder pathways can be exported as a specialized XML, as SBML (a model description language), and a variety of other formats to aid in exchange of pathway knowledge and computational analysis of pathways.

PathwayBuilder can also be run as a module of the larger open-source BioSPICE model analysis software (http://biospice.org), which links it to a large repository of model analysis tools such as parameter estimators, natural language processing tools for extraction of interactions from the literature, and model sensitivity analysis tools.

We are extending the collaboration and annotation capability of PathwayBuilder to serve AfCS needs. We will be working with the DMAB Lab to allow direct query and import of molecular data into PathwayBuilder. A formal process for submission of PathwayBuilder annotation back to the DMAB Lab will be developed. We are also extending the process ontology and the process-model repository to allow better annotation and more flexible mathematical modeling of the pathways, including a facility to annotate and model spatial information about the system. We plan to use PathwayBuilder and the associated tools with the AfCS to develop the best and most deeply curated signal transduction map and model ever built. We hope to develop a process for deep annotation of pathways in analogy to how groups of experts annotate genomes together in so-called "jamborees". Because of the scope of the problem, we will be starting the process by assembling experts on a small, manageable part of the signal transduction network (a few hundred proteins and interactions). The product will be a family of pathway models annotated with evidence for every molecule, complex, and molecular state in the system. This will be detailed with hypothetical mechanisms and parameter ranges for each process, as well as a set of hypotheses about the system that need to be tested. If successful, these jamborees will be expanded to larger parts of the network. The tools and approaches for collaborative model building are discussed in Section XVI.B, C, and D.

e. Activity 3: Model Simulation and Analysis. When every process in a network representation in PathwayBuilder has an associated model and parameter set, the tool is designed to be able to output models of different classes (including ordinary differential equations, differential algebraic equations, stochastic differential equations, partial differential equations, and chemical master equation). The models can be output in a few different formats including optimized C-code, MATLAB code, and SBML. While the process-models available for some of these mathematical classes are sparse, we are currently populating them with models expected to be useful in analysis of AfCS pathways. In these formats we have access to a number of tools for simulating these models, for fitting parameters of these models to data, and for testing sensitivity of models out to the values of these parameters (linear sensitivity analysis, response surface analysis, and some bifurcation tools). We will describe in Section XVI how parameter over- and under-fitting will be dealt with and how such issues as relative and absolute units for concentrations and kinetic parameters are calculated. Sensitivity analyses (such as linear sensitivity analysis, response surface analysis, and bifurcation analysis), which yield estimates of which parameters of the model most affect its behavior, are critical to the model/data comparison and experimental design efforts. The different approaches we take here will be described in Section XVI.

f. Activity 4: Model Data Comparison and Experimental Design. The use of models from task 4 is to provide a substrate for testing hypotheses about network function and dynamics. This task will develop new formal methods for four interrelated model-based data analyses (see Section XVI): 1) model invalidation, 2) model parameter estimation, 3) model discrimination, and 4) optimal experimental design. Analyses 1 and 2 provide feedback, given uncertainties in model parameters and primary measurements, on which parts of a model are violating the data constraints and which are completely unconstrained (and therefore not tested by the experiments). These methods, as described in Section XVI, provide experimental designs for improving identification of parameters and for resolving inconsistencies between a model and existent data. Analyses 3 and 4 are geared to determining which of a family of possible models are most consistent with the data. Because data will be initially sparse, and because alternative hypotheses about how signals flow in the network have already been generated during the initial pathway curation efforts, we expect there will be a number of models that are consistent with whatever the current dataset is. Model discrimination tools can assign likelihoods to each model, thus giving feedback on which is "most" consistent with the current data set. However, each possible model, generally differing in only a fraction of their species and reactions, will be consistent over possibly distinguishable parameter ranges. That is, some concentrations and rates for pieces of one model will not be the same for the same species and rates in another model. Thus it will be possible to design experiments to discriminate between the two models. The laboratory will work with those designing probes to monitor information flow such that sensors will be placed optimally to maximize the ability to discriminate between alternative hypotheses.

In total, these efforts represent a tight experimental/computational loop that directs experiments to refine and validate models of cellular function.

5. Contributions to the Signaling Research Community

We have made very substantial commitments to reach out to our colleagues in the signaling research community – supplying them with a variety of products to keep them informed of our progress, to facilitate their own research efforts, and to engage them in our endeavor. Our future plans include continuation and/or expansion of most of these efforts, and it thus seems simplest to discuss accomplishments in this area and future plans together in this section.

a. Information about AfCS Progress. i. Signaling Gateway (www.signaling-gateway.org). The AfCS-Nature Signaling Gateway, a collaboration between the AfCS and the Nature Publishing Group (NPG), is our primary interface with the research community. Content is contributed by both the editorial staff at NPG (Signaling Update section: summaries of recent articles, library of important papers, reviews, news, jobs, conferences, etc.) and the AfCS (Data Center, Molecule Pages, About Us). The Signaling Gateway won the Association of Learned and Professional Society Publishers (ALPSP) Award for Publishing Innovation. In addition, according to a news article written by Electronic Publishing Services:

As an editorial task it is a very successful story of network publishing. Hosted on the San Diego supercomputer complex, edited in Durham NC, Dallas, and London, and initiated by Nature editors living on site in San Diego for six months, it is also a case study in resource management and remote team working.

Use of the site continues to grow. As of May 2004, NPG sends a weekly Signaling Update E-alert to more than 100,000 unique active addresses. The site has more than 70,000 registered users, and there are more than 100,000 weekly page views. Google rankings (position of AfCS hits in Google searches) for such terms as "cell signaling" and even the word "molecule" are extremely high (#3 and #12, respectively). See AfCS.org/rev for a presentation on these issues prepared by NPG.

ii. Newsletter (AfCS.org/rev). A newsletter is written three or fours times yearly, posted on the Signaling Gateway, and distributed directly to all AfCS members (investigators, sponsors, Molecule Page authors). Email notification of newsletter publication (with a link to the newsletter) is distributed directly to those registered users of the Gateway who have indicated their willingness to receive news of the AfCS (and who can opt out of such direct mail distribution by choice).

iii. Research Reports/Publications (AfCS.org/rev). The communication of interesting findings or advances by the AfCS is achieved both by publication in peer-reviewed journals when appropriate and also through publication on our Web site. The AfCS Research Reports are electronic publications that offer a mechanism to inform the public rapidly of progress, technical advancements, or analyses with the aim of facilitating public use and interpretation of our data. Reports are carefully edited and internally reviewed by the AfCS to maintain consistency and quality. Readers can find links to raw data, detailed protocols, and printable versions of the reports.

b. Products to Facilitate the Research of Individual Investigators. i. Plasmid database and reagents (AfCS.org/rev). Using Invitrogen Gateway technology, the AfCS has generated over 5000 DNA constructs from the combination of 1500 cloned, sequence-verified gene sequences and 200 unique parent vectors. The latter range from vectors for tagging proteins with GFP variants and epitopes to lentiviral vectors used for expression of shRNAs. All AfCS constructs are recorded in a detailed web-based plasmid database, and this is mirrored at a publicly accessible site that displays those constructs that are available from the ATCC (currently approximately 3000). This resource will of course continue to expand.

ii. Antibody database (AfCS.org/rev). Results from all AfCS testing of various antibodies, both conventional and phosphospecific, are tabulated in a searchable database. Widespread use of this resource is perhaps already saving NIH an amount equal to their investment in the AfCS and might (should) put at least one company out of business.

iii. Yeast two-hybrid screen (AfCS.org/rev). We have screened hundreds of baits derived from major signaling proteins against libraries prepared from B cells, cardiac myocytes, and macrophages. All of this information is tabulated for public access and should be serving as an enormous source of leads for pursuit by individual investigators.

iv. Molecule Page database (AfCS.org/rev). The starting point for the Molecule Page database is the AfCS list of nearly 4000 proteins involved in signaling reactions (AfCS.org/rev). Each of these molecules becomes the subject of a Molecule Page, comprised of two major sections: automated data and author-entered data. Automated data is extracted from public databases and updated frequently: database links, domains and motifs, structure, gene information, orthologs and paralogs, and blast data. Expert authors, selected by an Editorial Board (Pat Casey, Chair), enter literature-derived information on the states of the protein, mechanisms for transitions between states, and function. This information is captured in a standardized format using an extensive web-based author interface. Molecule pages are peer-reviewed anonymously by NPG, with the assistance of the Editorial Board, and published on the Signaling Gateway. They are formally citable using digital object identifiers and will be listed in PubMed. The goal is to update the author-contributed section of each Molecule Page annually. Construction of the author interface was a formidable task, and it has been released for general use only recently. The job of authorship of a Molecule Page is a significant one, particularly for well-studied molecules. The carrots for the authors are a citable Nature publication (albeit an unconventional one) and participation in a collaborative effort that will be an enormous resource if the database becomes well populated. We are working hard to encourage participation. Improvements to the search interface for this database are a high priority.

v. Hypothesis Center (AfCS.org/rev). Rapidly accumulating AfCS data increasingly reveals unexpected patterns, phenotypes, and effects, many of which open avenues to interesting hypotheses and experiments that exceed the capabilities of AfCS laboratories. We plan to create an Hypothesis Center, designed to promote exploration of AfCS data by the scientific community. Hypotheses will be posted on the Signaling Gateway by AfCS laboratory scientists and participating investigators; in time we will consider permitting all scientists to propose and post hypotheses. Each hypothesis will suggest an explanation for a specific AfCS finding, along with experiments designed to test the explanation. The scientific community will be invited to comment on hypotheses, presenting appropriate references (or even data if desired) that they feel agree or refute the relevant hypothesis. Anyone is of course free to test the hypothesis.

vi. Protocols and descriptions of reagents (AfCS.org/rev). The Data Center of the Signaling Gateway contains a very large number of detailed protocols written by AfCS scientists describing procedures, solutions, and ligands used in AfCS laboratories. These protocols of course permit standardization of procedures across AfCS laboratories, in addition to their utility for the community.

c. Efforts to Engage Others to Participate in the AfCS Endeavor. i. Data. Our data is our primary product, and they should be used by individual investigators to further their own work. The yeast-two-hybrid database is an excellent example of such, while the Hypothesis Center will be an attempt to help individual investigators find the most nourishing kernels within the bounty of the AfCS data harvest. In addition, we must encourage others who are interested in detailed analysis of AfCS data to do so, both collaboratively and independently.

ii. Modeling tools. The model building and analysis tools developed as part of this project are open-source. The models developed, the data they are tested against, and the results of model prediction will also be distributed in standardized formats. These models and the tools to manipulate them will allow other scientists interested in the dynamics of signal transduction networks to explore their own hypotheses and demonstrate how to use the tools for their own projects.

References

<
1. Pebernard, S. and R.D. Iggo, Determinants of interferon-stimulated gene induction by RNAi vectors. Differentiation (2004) 72:103-111.
2. Scacheri, P.C., O. Rozenblatt-Rosen, N.J. Caplen, T.G. Wolfsberg, L. Umayam, J.C. Lee, C.M. Hughes, K.S. Shanmugam, A. Bhattacharjee, M. Meyerson, and F.S. Collins, Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells. Proc Natl Acad Sci U S A (2004) 101:1892-1897.
3. Jackson, A.L., S.R. Bartz, J. Schelter, S.V. Kobayashi, J. Burchard, M. Mao, B. Li, G. Cavet, and P.S. Linsley, Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol (2003) 21:635-637.
4. Schmid, A.C., R.D. Byrne, R. Vilar, and R. Woscholski, Bisperoxovanadium compounds are potent PTEN inhibitors. FEBS Lett (2004) 566:35-38.
5. Schmid, A.C. and R. Woscholski, Phosphatases as small-molecule targets: inhibiting the endogenous inhibitors of kinases. Biochem Soc Trans (2004) 32:348-349.
6. Hwang, J.I., I.D. Fraser, S. Choi, X.F. Qin, and M.I. Simon, Analysis of C5a-mediated chemotaxis by lentiviral delivery of small interfering RNA. Proc Natl Acad Sci U S A (2004) 101(2):488-493.
7. Tour, O., R.M. Meijer, D.A. Zacharias, S.R. Adams, and R.Y. Tsien, Genetically targeted chromophore-assisted light inactivation. Nat Biotechnol (2003) 21:1505-1508.
8. Chandy G, T. Mukai, Q. Mi Q, J. Zavzavadjian, E. Gehrig, M. Verghese, E. Fung, S. Couture, W.S. Park, N. O'Rourke, and I. Fraser (http://www.AfCS.org/rev).