Design and Methods - Bioinformatics/Database - MutaJAX
Design Overview - Robust information management systems are critical for large-scale, data-intensive projects. The system we propose for the JAX NMF will provide support for workflow management and data warehousing. We first describe our database design concepts and then put these in context of the overall system and applications architectures. Then we describe plans for using existing resources to provide electronic access to information on mutant mice, regardless of where or how such mutants are generated. The database design and system architecture were developed after consulting with informatics groups at MRC-Harwell.
The critical elements to implement for the NMF include:
- a core database (MutaJax) that is flexible and scalable;
- reliable, stable, and secure data storage;
- easy-to-use Web-based interfaces for data input and retrieval;
- functions for tracking individual mice or the project as a whole;
- integration of raw data with various computational/statistical methods for determining phenotype baselines and prediction of mutants; and
- mechanisms for electronic access to information about mutant mice and their availability regardless of where or how these mice are generated.
Database Concepts and Design - The goal of the informatics infrastructure proposed is to create data structures and user interfaces that mirror (as closely as possible) the natural workflow associated with animal care and phenotype assessment. We have carefully evaluated the needs of the JAX NMF and identified key information areas: Mutagenesis, Animal Management, Sample Tracking, and Phenotyping Data. These are organizing themes for the project which, in turn, are reflected in the design of the core database (MutaJax). These database concepts closely match those used by informatics groups at the Harwell and GSF mutagenesis centers. Figure 1 shows a schema of the database concepts, including:
- Mutagenesis: The specific mutagenesis protocols applied to male mice will be recorded and will include date of injection, weight of mouse at date of injection, and dosage of mutagen.
- Animal Management: The main types of data to be recorded for animal management include basic lifecycle data for individual mice, matings, and litters. Data generated for confirmation of mutants, such as heretability testing and mapping, also will be stored in MutaJax.
- Sample Tracking: Blood, whole organs, and other biological samples may be taken from individual mice. Sample-tracking data will include the samples taken and when, where the samples are stored, what tests were performed on the samples, and the test results.
- Phenotype Screen Data: Obvious phenotypic dysmorphologies will be recorded when observed by animal technicians. More commonly systematic phenotypic screens will be used to evaluate potential mutants, and data collected from these will include the raw data generated by the screen, and the analyzed summary for the screens performed on each mouse. The description of each screen will include the type of equipment used, a brief description of the protocol, and the type of mutant the screen is designed to detect. The raw data (which could be text, video, or image data) for each phenotype screen will be archived. We will also develop routines for raw data into statistical packages such as SAS, SPSS, MatLab, and Minitab. Interpreted data and information on baselines (normal values) for various screens will be recorded in the database in a queryable form.
- Task Lists and Project Audit Functions: In addition to serving as the central data repository for the project, MutaJax will be used to generate task lists and status reports. Task lists will include summaries of date-driven tasks to be performed on individual mice by animal technicians. Examples include injections, weanings, matings, and phenotype tests. Status reports will allow project managers to generate standard reports about the progress or current status of the mutagenesis program. Examples include, summaries of how many mice have been tested for a particular screen over a defined period of time, number of potential mutants identified, number confirmed, etc.

Figure 1 - High-level data model for MutaJax showing relationships among key concepts needed for managing information for large-scale mutagenesis at JAX. This figure is not intended to be a complete representation. Arrows indicate the type of relationship needed (double-headed to indicate many-to-many; single-headed represent one-to -many). The "analysis cloud" depicted for Screen Data indicates a suite of analysis tools used to extract and analyze raw data from the various phenotyping screens -while not part of the database per se, they are an important part of the information and workflow management process and are closely tied to the database. Raw data generated by various phenotype screens will be archived so that it can be easily accessed.
System Architecture - We will implement an informatics infrastructure based on a client-server architecture. This architecture is ideal for information systems that serve multiple, distributed users. The server component will consist of the MutaJax database server and a WWW server. We will use Oracle as the database platform for the project. Sybase is an industrial relational database management system (RDMS) widely used in the genomics community. It provides scalability and transaction control that is key for accommodating large volumes of data and new data types. Oracle has integrated backup and transaction logging mechanisms for data security and recovery in case of system malfunction. We will run Oracle on a Unix platform for maximum stability of the system. There is extensive Oracle expertise on-site at JAX. Several existing community databases at JAX are implemented in Oracle (for example, Mouse Genome Database, Gene Expression Database, and TBASE).
Both a production server and development server will be deployed to support the JAX NMF. A production server will handle the day-to-day data entry and retrieval transactions. It also will be used as the WWW server. A separate development server will support the testing of new components and system changes.
The client component (for data entry and retrieval) will include a series of PC and MacIntosh computers running the same version of Netscape Communicator Professional. Client interfaces to the server components will be achieved using standard Web-based forms (that will communicate with the database via Common Gateway Interfaces (CGI) scripts) and JAVA applets. JAVA is a fully object-oriented programming language, has built-in networking and multi-media capabilities, and internal security features for Internet applications.
Web-based interfaces to MutaJax will allow technicians to enter data into the database as they are collected, eliminating reliance on data entry from written records. Automated checks of data consistency and accuracy will be built into the interface applications used for data entry (see below).
Applications Architecture - We will use a multi-layer architecture to develop and deploy client interfaces and applications. The "layers" of a typical application include a Presentation Layer (user interface), the Transaction Services Layer (rules for passing data from the interface to the database and vice versa), and Database Access Layer (details of the database transactions). Multi-layer applications have a number of advantages, including scalability and performance, component reuse, rapid development and implementation times, and ease of maintenance. With this architecture it is possible to change details of one of the layers without changing the others. For example, one could change the details of a database structure and data access without having to change any aspect of the user interface. Applications development for the mutagenesis programs at Harwell and GSF is based on a similar software engineering paradigm. Employing a similar strategy at JAX increases the likelihood that some software components common to all mutagenesis programs could be shared across multiple institutions.
Community access to mutant information and mouse resources - The MutaJax database will play a critical role in the successful functioning of the JAX NMF, for animal management, phenotype assessment, heritability testing, and project development. However, because a major goal of JAX NMF is to provide information about new mutants to the research community and access to the mice that carry them, we will 1) regularly export data to the Mouse Genome Database (MGD; Blake et al., 1997); 2) provide public access to detailed phenotypic screening data; and 3) include JAX NMF mouse stock information in the International Mouse Strain Resource [257]). MGD, as the central genomic database for mouse, will provide an integration site for MutaJax data on phenotype and mutant alleles, so that data can be searched and analyzed in the context of all other known phenotypes and alleles. Thus, members of allelic series can be contrasted, phenotypic similarities among diverse mutations and strains can be analyzed, and different mutant sources can be compared. We also will provide access to the archived, detailed phenotypic screening data and protocols for each mouse assayed in NMF to researchers. These data will be essential for researchers wishing to adapt protocols for their own use, develop progeny testing procedures, and assess NMF mouse phenotypes with others they are testing. Contributing NMF stock information to IMSR, which currently serves as a Web-based 'look up' catalog of mouse stocks (live and frozen) available from Harwell and JAX, will allow users to search for particular mutants, and through the links provided, access the holding institution for the stocks of interest. IMSR is expanding to include searchable access to other mouse resource holders such as the ORNL and the European Mouse Mutant Archive (EMMA), to ultimately serve as a 'look up' for all mouse stocks worldwide and a coordination point for those seeking specific mutants and strains.