The computerized analysis is based on the division of the problem into a large number of pieces. Within each piece, a model is built which describes the behavior of that piece. The individual pieces are then assembled together to form an overall model of the product.
As the number of pieces used to model the product is increased, the approximate solution obtained from the computer approaches the exact solution of the original problem. The maximum number of pieces that can be used, and therefore the accuracy of the analysis, is determined by the capacity of the computer hardware and software.
This modeling process is often called discretization, because the original problem is replaced by a finite number of discrete pieces. Analysis programs based on this approach include finite element, finite difference, boundary integral and moment method techniques. These formulations exploit the strength of digital computers, which perform large numbers of similar tasks rapidly.
Mathematically this simulation involves the solution of a large system of simultaneous equations
[A]{X} = {B}
where
[A] is an N-by-N coefficient matrix that represents the product being analyzed,
{B} is an N-by-1 right-hand side vector that represents the environment that the product is operating in, and
{X} is an N-by-1 solution vector being computed, which represents the response of the product to the operating environment.
As the size of the matrix and vectors N is increased, the solution obtained by the computer approaches the exact solution of the problem.
The following flowchart illustrates the steps involved in performing this type of computer simulation.
|
1. Form [A] and {B} The process begins by forming local equations for each of the discrete pieces. These equations are in the form of contributions to the matrix [A] and right-hand side vector {B}. For large problems, this data is usually written to disk as it is generated.
2. Solve [A]{X} = {B}
3. Process solution {X} |
If the operating environment is severe, then the solution values in {X} will change the original model represented by [A] (nonlinear analysis). In addition, if the solution {X} indicates a bad product design, the process must be repeated (design optimization). In either case, a new matrix [A] is formed and the solution is repeated until convergence is obtained.
As we increase N, the time and storage requirements for Steps 1 and 3 increase differently than the requirements for Step 2. To illustrate this point, let us look at a model using a full complex nonsymmetric matrix [A] run on a computer operating at 100 Mflops. In this example, we assume that the time and storage requirements for Steps 1 and 3 are the same as Step 2 when N is 1000.
![]() |
As we increase N from 1,000 to 100,000, the time required for Steps 1 and 3 increases linearly, from 27 to 2,675 seconds. Increasing the number of pieces by a factor of 100 results in increasing the processing time by a factor of 100.
In Step 2, however, the scaling is drastically different. The time required to solve the system [A]{X}={B} increases as the cube of N. As a result, increasing N by a factor of 100 results in the processing time increasing by a factor of 1,000,000. |
![]() |
Similarly, as we increase N from 1,000 to 100,000, the space required for storage by Steps 1 and 3 increase linearly, from 16 to 1,600 Mbytes. Increasing the number of pieces by a factor of 100 results in increasing the space required to store them by a factor of 100.
However, because [A] is an N-by-N array, the storage requirements for Step 2 increase as the square of N. As a result, increasing N by a factor of 100 results in the storage for [A] increasing by a factor of 10,000. |
One of these 3 issues will limit how far N can be increased on a given machine. On systems using software designed for small values of N, the performance deteriorates so rapidly when [A] exceeds physical memory that N cannot be increased further.
In the example above, if data is read from disk at 3 Mbytes/second, and the data is reused 267 times, then the processor will be supplied with 800 Mbytes/second of data and operate at 100% of capacity.
By developing special software to read data ahead of processing, data will be available when required by the processors. As a result, the disks and processors will run concurrently and no time will be lost waiting for data.
The solutions listed above all depend on special software. That is exactly what the Fast Matrix Solver (FMS) is designed to do.
The following figure shows the original analysis program with the equation solution phase replaced by FMS.
|
1. Form [A] and {B} FMS captures the matrix [A] and right-hand side vector {B} as they are generated. Parts of [A] may be transferred by rows, columns, blocks or finite elements. This data is stored directly in the FMS Database. Depending on the problem size, this database may physically reside in memory or on disk.
2. Solve [A]{X} = {B}
3. Process solution {X} |
Steps 1 and 3 of this analysis depend on the physics of the problem and the modeling technique used. Software and hardware efficiency are not a critical issue. By contrast, Step 2 is independent of the physics and modeling technique, but pushes the hardware to the limit. For this reason, FMS was designed from a hardware point of view.
There is an optimized version of FMS for most computer platforms. Each version makes maximum reuse of registers, cache and memory. On systems equipped with multiple processors, the processors are operated in parallel to reduce run time. On all systems, disks are operated in parallel and transfer data concurrently with processing.
While the internals of FMS are machine specific, the interface to your application program is generic. This allows you to maintain a single version of your application while achieving peak performance on each computer system.
This team may be sorted by type of technology, hardware and software, as shown in the following figure:
| System Integrator They integrate the hardware and software into a system you can use. Also included is the operating system, which may be purchased separately. |
Applications These are programs that you either develop or purchase. They perform a specific function, from word processing to sophisticated scientific analysis. |
![]() |
|
| Chip Manufacturer Current fabrication technology requires extremely expensive facilities to produce the fast, high density processor and memory components of today's computers. Companies specialize in this technology, producing a variety of components for various system integrators. |
Middleware At the base technology level in software is an industry called middleware. This is software that is closely coupled to the hardware and performs a generic function to a variety of applications. Database software is a typical middleware product. |
FMS is middleware that combines database and math library functions. It extends the operating system by providing file striping, asynchronous I/O, memory management and parallel processing. FMS also interacts with the hardware at a low level with optimized kernels designed for each specific platform.
Called as a library from programs written in FORTRAN or C, FMS provides a back door to the chip and operating system technology not normally available in programming languages.
While the internals of FMS and it's interface to the machine are constantly changing to take advantage of new technology, the application program interface has remained constant. This allows your applications to take advantage of new hardware technology with no development effort.
With today's world economy, maintaining a competitive advantage is critical. Products must continuously be refined to take advantage of new materials, manufacturing techniques and changing customer requirements. Computer simulations are well recognized as a vital ingredient in meeting these demands. By pushing the envelope of computer performance, FMS provides a technical advantage that helps you meet your goals.