Oct 10 2008

Physical Database Design, Hardware, and Related Issues

Published by admin at 10:53 am under Technology

By Aurelie A. Peralta

Understanding Application Types

In database theory, broadly speaking, a transaction is a single, atomic SELECT, INSERT, UPDATE, or DELETE. However, with regards to application types, a transaction is generally more loosely defined as a business transaction, possibly containing multiple INSERTs, UPDATEs, or DELETEs. In addition, DML truly refers to SELECT, INSERT, UPDATE, and DELETE. DML and transaction usually mean write-only, or modify-only. To distinguish the SELECT operation as read-only, the term query is used.

There are three main common application types in the world of database system applications:

1. OLTP: On-line Transaction Processing. An OLTP system is an application that contains heavy DML, transaction-oriented activity; primarily updates, but also inserts and deletes. Classic examples are reservation systems such as those used by airlines and hotels. OLTP systems can have high concurrency.

2. DSS: Desicion Support System. A DSS is typically a large, read-only database with historical content, and is generally used for simple canned or ad hoc queries. A good example of a DSS is a database behind an organization’s intranet.

3. Batch: A batch system is a non-interactive, automatic application that works against a database. It usually contains heavy DML and has low concurrency. Classic examples are production databases and operational databases relative to DWs.

Some less common application types include:

1. OLAP: On-line Analytical Processing. An OLAP system offers analytical services, as the name implies. This means mathematics, statistics, aggregations, and high computation. An OLAP system doesn’t always fit the OLTP or DSS molds. Occasionally, it is a cross between the two. In addition, some people simply view OLAP as an extension or an additional functional layer on top of an OLTP system or DSS. ROLAP stands for Relational OLAP.

2. VCDB: Variable Cardinality Database. This type of database is frequently a back-end for a processing system that can causes the tables in that database to grow and shrink considerably during the processing phase, which otherwise may be constant or periodic. Cardinality refers to the number of rows in a table at a given time.

Using Quantitative Estimates

Quantitative estimating of any sort is an attempt to quantify, or measure, some process or product. With databases, the two main types of quantitative estimates are transaction analysis or volume analysis and sizing analysis.

Transaction Analysis

Transaction analysis is simply putting numbers on the database system. Different measures mean different things, and apply to certain kinds of database systems. The most typical measures, or metrics, include the minimums, averages, or maximums of the following:

1. Number of concurrent users

2. Response time

3. Elapsed time

4. Number of transactions

5. Number of concurrent programs

6. Number of bytes read or written

Sizing Analysis

Sizing is perhaps a more widely known activity, if not widely practiced often enough by all DBAs. With sizing we ask “How much?” with regards to data storage. The fundamental thing is simply that a table with n rows of b max bytes per row will need at least nxb bytes of storage. Remember, too, that the figure you size for is usually based on tables and indexes alone. As with all RDBMS vendors, there are many more achitectural considerations.

Denormalizing

Denormalization refers to dropping the level of your tables’ normal forms back down a few notches for physical design, performance tuning, or other reasons. Do not do this unless you have a very good reasons. A shortage of disks with no budget for new ones may be a good reason. Expecting to have poor performance without evidence, as opposed to actually having poor performance, is not a good reason. In fact, even poor performance itself does not immediately indicate the need to back off on your logical design. Your first step should be performance tuning. Denormalization should always be a last resort.

Understanding the Storage Hierarchy and RAID

One of the most important things a DBA can learn about is the storage hierarchy and its associated tradeoffs. This, more than anything, helps explain a lot about physical design and performance tuning. One of the key ideas behind the storage hierarchy is the electromechanical disadvantage - anything that has a motor or moving parts is inherently slower than something that is solely electronic.

Clearly, memory is faster than disk. As we go up the pyramid, speed increases (access time), cost increases (per unit) but storage decreases.

Understanding RAID

RAID, or Redundant Array of Inexpensive Disks, is perhaps a misnomer. Since its inception, the disks that make up RAID have never really cost any less than regular SCSI (Small Computer Systems Interface) disks. In addition, RAID requires special hardware, software, or both to work, at added cost. So the I in RAID is not quite correct.

RAID is a set of disks that can work in parallel to reduce I/O time by a factor of how many disks make up the set, an important thing for databases. RAID works in parallel through a technique known as striping, as opposed to ordinary disk storage, which works in serial. This is simply writing a single file using stripes, or chunks of the file, across multiple disks in parallel. The stripes are some multiple size of a physical data block, the smallest unit of disk I/O. Typically, a data block is made up of 512 bytes. This is true for most UNIX systems, VMS, and DOS/NT.

RAID also offers real-time disk failure recovery, another important thing for databases. RAID can offer this level of availability through parity information, which is also written out to a disk or disks.

RAID levels most used in industry today are 0, 1, 3 and 5. All other levels are either not used or are some variation of these four. RAID levels are offered by vendors that number 7 and higher, but these are rarely seen in practice.

RAID 0 -basic striping with no parity.

RAID 1 - known as mirroring, or sometimes duplexing, again with no parity.

RAID 3 - is striping again, except now with a single, dedicated parity disk.

RAID 5 - also striping with parity, except rather than using a single, dedicated disk, it stripes the parity along with the data across all disks.

Understanding Bottlenecks in a DBMS

Most often in the past, DBMS have always been accused of being I/O bound, or disk-bound. This is the same as saying that a DBMS is bottlenecked in a system by its reading to and writing from disk. A disk has much slower access speed than memory or CPU. And this actually has been the case for many database applications over the years. It is especially true for DSSs, VLDBs, and DWs, because huge amounts of data must be moved with single queries. However, this is not always the case. OLTP and OLAP systems can be often be memory- or CPU-bound.

If a database hardware server has a low memory-to-disk ratio, meaning that the application is memory-poor, that database will suffer considerably due to the lack of room in the memory for data and code caching.

For a client/server system to function efficiently, proper application segmentation must occur, and the network must not be overloaded. Further the network hardware and software should be modernized to take advantage of the current networking capabilities.

Making Your Platform Selection

When choosing what type of hardware database server and operating system is right for your application, there are many things to consider, including:

1. Application Type

2. Quantitative Estimates

3. Current Environment: Basically, what are your predominant platforms now?

4. Trends: Where is the industry headed with your current environment and what you are now considering.

5. Processing Needs: Is your need real-time, less time-critical, or periodic?

6. Storage Needs

7. Staff Capabilities

8. Time Constraints

9. Porting and Partnerships: Does the RDBMS vendor have cozy relationships with the OS and hardware vendors?

10. Integration

Operating System Integration and General Memory/CPU Recommendations

Essentially, memory is very fast electronic storage. It stores instructions and data. For DBMS, the most important thing is that the OS can yield some of its memory to it. Then the DBMS can do with it what it will. This is why a DBMS is sometimes referred to as a micro-OS, or an OS within an OS, or an OS on top of an OS.

Locking, a key component to DBMSs, is also handled through memory structures. Shared resources are secured one at a time from competing processes. A DBMS either handles its own locking, does it partially with the OS, or yields locking duties to the OS.

RDBMSs have come a long way with regards to CPU utilization. With the advent and evolution of multiprocessor machines within the last 10 years, many things have changed. RDBMSs software now is either fully multithreaded or pseudo-multithreaded to take advantage of the processing power of the multiprocessing machines.

Physical Design Principles and General Hardware Layout Recommendations

Physical database design is actually pre-tuning, or nothing more than the second stage of tuning. (Logical database design is the first stage) There are many design principles, but the major ones always include the following:

1. Divide and Conquer. Partitioning, segmentation, and parallelization are all extensions of the divide-and-conquer algorithm design approach. If a process can be broken up to pieces that can be run concurrently, it is said to be parallelizable.

2. Preallocate and precompile. Allocate your resources ahead of time, rather than let the software do it for you on-the-fly, or dynamically.

3. Be proactive. Anticipate the problems.

4. Bulk, block, and batch. Use mass transit. Batch things together that make sense to be batched.

5. Segment the application appropriately. This could be considered a subheading under divide and conquer.

The major objective of physical design and tuning is to eliminate, or at least minimize, contention. Contention is when two or more pieces of software compete for the same resource.

Reference: Using Oracle 8 by William G. Page Jr. and Nathan Hughes, Copyright 1998 by Que Corporation

2 Responses to “Physical Database Design, Hardware, and Related Issues”

  1. JessicaFexon 11 May 2009 at 8:20 am

    Wow! Thank you very much! I always wanted to write in my site something like that

  2. Ariananarlon 14 May 2009 at 8:55 am

    Hi there, not sure that this is true, but thanks

Trackback URI | Comments RSS

Leave a Reply