Welcome, you are not logged in.
Login
Design-Build DATELINE
The Journal of the Design-Build Institute of America

June 2005

The Keys to Success for Mission-Critical Projects


Definitions

Like any technical building, there is a set of common terms that are germane to mission-critical spaces. For your reference:

1x / 2x / 4x / etc — Indicates the number of processors installed in a given server.

24/7 or 7x24 — Implies seven days a week, 24 hours a day operation, forever.

9s — See Sigma.

Active Space — Space on which IT equipment resides, excluding circulation space, area occupied by air handlers, or electrical equipment or access.

Application — The manifestation of the installed software. Outlook, Excel, and Word are all applications. If applications are online and are either ready or in use, they are known as being available.

Availability — The actual time an application is available to an end user. This can be calculated but is typically measured by an application’s run time.

BC/DR (Business continuity/disaster recovery) — The plans and systems that, when functioning together, allow a business to respond to a natural disaster or unforeseen event without any or significant loss of IT functionality.

BICSI — The industry body responsible for the Registered Communication Distribution Designer (RCDD) and presently working on the latest/third-generation data center construction standard. See http://www.bicsi.org/

Blade — An individual server that has either one, two, or four processors mounted on a motherboard-like backboard that in turn is mounted into a multi-blade chassis. This is basically a server that has no sheet metal and a centralized power supply and network connections.

Business-Ready Date — The date in which the data center is ready to assume real-world IT processing operations. This date follows the completion of construction by either weeks or months.

Churn — The rate in which an entire platform type, system, or facility completely changes out its IT equipment.

Compaction — The phenomenon of increasing IT density as viewed by the removal of older equipment and their replacement with new equipment that has a greater processor-per-square foot than the former systems

Concurrently Maintainable and Operable (CMO) — A system topology that allows for maintenance of the installed electrical or mechanical systems in a manner that is transparent to the ongoing system operation. It may possess fault-tolerant system attributes.

Fault-Tolerant — A system topology that is concurrently maintainable and operable, but also possesses the ability to self-heal system failures while preventing the interruption of services to a given system, platform, or component. This definition is as applicable to utility systems as it is to software and hardware.

Enterprise — Denoting concerns of systems that span the entire company or a company-wide function or service.

Enterprise Server — A large server that consists of several high-performance servers wholly contained within it. This is an IT system that resides in the IT space between smaller servers and mainframes and in some cases offers similar performance to mainframe systems. These systems are manufactured by a host of firms, but the HP SuperDome and Sun SunFire servers are the best known of the group.

Failure — A failure of a given system or application that may or may not result in the disconnection of an end-user service.

High Density — Originally grew out of the Wintel systems, but now describes any computing or storage system that exceeds 150 W/SF.

HIPPA — Healthcare Information Privacy Protection Act. See http://www.hhs.gov/ocr/hipaa on the U.S. Health and Human Services website.

MEP — Mechanical, electrical, and plumbing, or the building’s utility systems

Mission-Critical — Any space or function that, should it cease to operate, would cause financial or operational injury to the organization it supports.

Outage — A failure that results in a disconnection of any IT service or the failure of an application.

Platform — An individual storage system, server, or network component. This is typically seen as a single box or enclosure.

Processor — The physical computation chip set in a server. There may be more than one in a server or on a blade.

Power Density — The manifestation of total power consumption when viewed as the total UPS power demand or installation divided by the total raised floor area in which the IT equipment is installed. Some parties also use the total power consumption divided by the active floor space.

Reliability — The mathematical calculation of the likelihood of an outage’s occurrence.

SOx or SarBox — The Sarbanes Oxley Act for corporate financial reporting and management. See http://www.sec.gov/spotlight/sarbanes-oxley.htm and http://www.sec.gov/divisions/corpfin/faqs/soxact2002.htm on the U.S. Securities and Exchange Commission’s website.

Sigma — The number o0f significant figures in a reliability calculation that would indicate the theoretical time a system would be “up” or available. This is commonly seen as a series of 9s, or 99.995 percent, uptime or a 4th Sigma facility. This figure is never rounded up to the next Sigma.

Single Path — A non-redundant utility system topology that possesses a single point of failure.

Single Point of Failure — A singular device or connection within a utility, IT, network, or application system that can not be maintained and, should it fail or require maintenance, the downstream power or end-user service would be interrupted.

System — A system is a collection of platforms, organized into a single IT function, such as e-mail or the external website.

Tier — Indicates the type of electrical and mechanical system for a data center, with the higher the Tier rating indicating a more reliable or redundant MEP infrastructure.

Uptime — Same as availability.

 


Mission-Critical Team Partners

This is a partial list of who needs to be at the project table:

  • Architect.
  • Consulting engineers.
  • Owner’s project or construction manager.
  • General contractor.
  • Electrical contractor.
  • Mechanical contractor.
  • Owner’s rep for security, corporate real estate and finance.
  • Owner’s IT rep — could be overall or individual group leaders.


Successful Models for Delivering Mission-Critical Work

For most data centers the preferred model for delivering this work is:

  • Collaborative.
  • Negotiated.
  • Highly technical.
  • Possesses a large capital equipment purchase.
  • Most often design-build or design-assist with substantial front-end pre-con efforts by the builders.
  • Large, back-end testing and commissioning effort.
  • Extensive client training and turn-over.


What are the key considerations when constructing mission-critical spaces?

  • The site location respects the risk management profile for the company and addresses whatever federal laws are in force for physical dispersion of this kind of building.
  • Building can survive any weather, seismic, or civil event that is likely for the given location.
  • There is ready access to redundant telecommunications providers with multiple paths to and from the site.
  • Redundant power and cooling systems can be maintained while they are operating.
  • Uninterruptible power supply (UPS) systems provide clean, continuous power to the IT equipment, coupled with a power distribution system that can automatically select the “best available” UPS power source. This is known collectively as the critical power systems that can heal themselves in case of a fault or error.
  • Extensive building and utility system monitoring and controls.
  • Network systems are mapped against and agree with the systems and processes they support.
  • Site access, building perimeter, and interior security in layers.
  • Flexibility for expansion, both internal and external, and a known strategy for addressing this at the time the project commences.

24/7, uptime, operational continuity, disaster recovery — it all sounds like a Tom Clancy book. Mission-critical spaces aren’t mystical, however, they are part of everyday business.

Over the past ten years, mission-critical has been used to describe everything from services to sports drinks. While the current usage is mixed, the earliest acknowledgment of mission-critical operations sprang from the nuclear industry and the military. The phrase was popularized by the major New York-based financial and banking sectors sometime in the early 1980s to describe their information technology (IT)-supported businesses.

Every business relies on information to function, and automation and information technology have been key components in the productivity gains that have supported the recent economic spurt. Some enterprises — such as banks, insurers, and governmental agencies — process information in the normal course of business. Others use information to run their business better and to serve their customers more comprehensively.

Today, mission-critical has come to mean any service or manufacturing process so vital to any business that, should it fail, the consequences to the business, its customers, or its employees would be catastrophic. If a mission-critical process stops, so does the business it supports, and it knows no limit in size, scope, and urgency. Mission-critical thinking is simply a reflection of the business it supports and the environment in which it works.

For most clients, mission-critical space includes data centers, customer call centers, the corporate network, and for some industrial users, certain manufacturing operations. In all instances, these facilities represent strategic operations and needs for a given company. Every business has mission-critical space, from the major money center bank with several data centers worldwide to the small general contractor with a single server room and UPS power system supporting his e-mail, payroll, billing, and estimating systems. Mission-critical systems and spaces touch all of our lives.

Who Delivers Mission-Critical Work and How Is It Typically Done?

Mission-critical buildings and spaces are ubiquitous — they are the way stations of the information highway. Although that they are commonplace, they are nonetheless a very specialized type of space that requires knowledge of UPS power systems and fast-track low-and mid-rise construction. Specifically, a sophisticated general contractor with a strong MEP coordination function and a highly experienced electrical contractor, who is responsible for the majority of labor hours and equipment on the project, are vital for success. Aside from a major portion of the project’s labor, the electrical contractor is typically responsible for the integration of the critical power system, the generator plant, and the electrical and mechanical monitoring systems.

The design-build or team-based approach is particularly valuable to mission-critical work for three reasons. First, the speed at which the project must be delivered demands that the construction and design professionals work together to develop the final design and construction detailing. Most large-scale project designs are unique to a given end-user, so copying previous work typically is not an option. Without the ability to multi-task the design and build functions of the project via multiple packages while ordering the design in a manner that enables ongoing construction, projects would take far longer to complete and would be fraught with errors. This is an arena in which contractor-led delivery is a natural fit.

Second, the small details of the project that are vital for the assured operation of the facility are never completed at the time of the permit set or the early structural packages. Only by working through the details with the delivery team can an owner and consulting engineer be assured that their project will work correctly and be delivered on time.

Third, in addition to the structural steel, a majority of the medium- to large-scale electrical and mechanical systems are purchased very early in the design process to protect the business-ready date of the project. The purchasing entity could be the general contractor, the electrical and mechanical subs, or a third-party purchasing agent. Whoever it is, they are always engaged early in the project. Without the construction professionals’ ability to secure the appropriate long lead time equipment early, the project would fall months behind. In a majority of the cases, the major electrical and mechanical equipment is purchased during the schematic or design development phases of the work.

Thankfully, nearly all of this work is delivered via a team-based project approach, where the constructors, design professionals, IT users and the entire team work together for a common goal. Whether it’s a master-builder or a team-based approach, the contractor is one of the most critical team members. Only the prime contractor possesses the interwoven skills of scheduling, budget control, workflow, project approvals, and testing/turnover that are key to the success of the effort.

Mission-critical buildings bring out the best of the master builder or team-based project delivery. This is anchored in the fact that the pool of qualified firms is somewhat limited for the scope of the business, and that the expertise to successfully deliver these spaces is considerable. This is not because the building itself is any more complex than, say, a laboratory or hospital. It’s just that the details, no matter how small, could lead to a failure that takes out the whole business.

For data centers, the devil is in the details.

So What?

When an outage occurs, it can be devastating to a business. When they can’t log onto your website, your customers are going elsewhere. When they’re left on hold, they will hang up and call someone else. When you can’t trade a stock, you lose money. When you can’t settle with the Federal Reserve Bank, you get fined. At best, you are never delighted at a failure of an expected or promised service level. Certainly, it will cost a great deal of money. At worse, it could break a business.

There is no upside to an outage: information can be lost, employees sit idle and can’t work, customers are upset and may take their business elsewhere, or you may be fined by the federal or state government. Outages result in the disconnection of the end-user from a system or outage, however, so not all incidents are outages.

The good news continues. Over 70 percent of infrastructure failures and nearly 50 percent of IT system failures are a direct result of human error. In this vein, the built environment and how it’s operated addresses this reality and seeks to minimize this risk by strengthening the attendant utility systems. We’ve seen the enemy, and it is us.

What does an outage cost a given business? Quite a bit.

Outage costs are further compounded by the fact that an average outage recovery takes four hours. So for the major banks, for example, this is a +$30M issue. Thus, an outage’s total cost is:

Outage value ($$/hour) x Outage duration (hours) = Cost of the Outage ($$$$)

Some outages, like the America OnLine outage in the early 1990s, served as a clarion call for improvement and reliability. Today, AOL operates some of the most reliable facilities in a world-class manner. Businesses simply are not going to take outages lying down and clients will seek out a measured and appropriate response to the risk facing their particular business.

What Business Initiatives Are Forcing Business Into Developing Mission-Critical Spaces?

Several compelling factors affect all businesses, including:

  • Globalization.
  • The importance of information to employees and customers.
  • Outsourcing.
  • Real-time business continuity.
  • Disaster recovery.

In some cases, business continuity is vital to maintaining your client’s satisfaction. In some cases, it’s required by law, with failures resulting in felony convictions for your client’s senior management. In this day and age, more than your clients are watching you.

What Laws Affect Mission-Critical Operations?

There are several, non-cooperating laws that cover these operations. Thankfully, clarity is emerging in both the jurisdictional and standards end of this business. These include Sarbanes-Oxley, HIPPA, and the ongoing requirements of the FDA and SEC.

After the collapse of Enron, Sarbanes-Oxley (SarbOx) was enacted to discipline public corporations’ financial reporting. While the bulk of the discussion around SarbOx has been the detail and specifics required of year-end financial reporting, it reaches far into any major corporation’s information vaults. Aside from accuracy, data must be retained for years.

What most people do not know is the SarbOx also includes a provision for the physical dispersion of information processing facilities in many industries. This came in response to the 9/11 attacks on New York and the first-ever complete loss of several corporate data centers in lower Manhattan. According to SarbOx, corporate data centers must now possess operating redundancy and must be no closer than 25m/41km air miles apart.

The next layer of laws address privacy, inter-industry settlement, and customer efficacy. These include the Healthcare Information Privacy Protection Act (HIPPA) and SEC end-of-day settlement requirements for major banks, as well as individual industry requirements. All must be considered when planning, constructing, and operating these spaces. In many cases, several facilities work together to ward off disasters and share the IT workload.

Lastly, local jurisdictions have enacted more stringent environmental laws affecting the operation of mission-critical facilities, and these typically cover air quality and pollution management as well as hazardous material handling, disposal, transportation, and abatement. Thankfully, there’s been progress in defining the various grades of data centers and mission-critical facilities.

Standards and Tier Ratings

Over the past ten years, the industry has strived for consensus on building standards for mission-critical facilities. While the actual definition of each grade of facility varies, it is essentially a performance-based metric, with the classifications known as Tiers. Some of the first work in this area was conducted by consultants in the Phoenix area in the early 1990s, with follow up work performed by Uptime Institute. The first use of Tier ratings looked like the chart below, and much of this thinking is still in use today.

In this nascent standard, the Tiers had these definitions:

  • Tier I ­— Single path for power and cooling distribution, no redundant components, 99.671 percent availability.
  • Tier II — Single path for power and cooling distribution, redundant components, 99.741 percent availability.
  • Tier III — Multiple power and cooling distribution paths, but only one path active, redundant components, concurrently maintainable, 99.982 percent availability.
  • Tier IV — Multiple active power and cooling distribution paths, redundant components, fault tolerant, 99.995 percent availability.

Over time, the type of equipment drove up the power consumption and cooling demands for the data centers. The individual definitions were further defined to respect the earlier definition, but to now divorce system topology from reliability and available. The new EIA/TIA 942 standards speak to this, but the new BICSI standard (in development right now, intended to become the new ANSI standard) takes it one step further to a point where they now look like this3:

  • Single path, without redundant components — Single Path/Single Module/Single Source. This is equivalent to the Tier I rating.
    It can not be maintained while it’s operating and a failure will likely result in a loss of electrical service to the load. This system has a single electrical supply to the load and no source diversity. This system can’t be maintained without interruption to the load. Mechanical and vital house loads would be supplied by non-redundant power systems.
  • Single path, with redundant components — Single Source/Multiple Module/Single Path. This is equivalent to the Tier II rating.
    This system may experience a failure while it’s operating due to a lack of redundancy in the distribution system. Redundant components may exist on an n+1 and paralleled basis in the UPS or generator systems, but does not offer redundancy in the distribution system. A failure in the N+1 systems would not likely result in a load failure, but would reduce the redundancy level in the paralleled systems to “N.” This system has a single electrical supply to the load and no source diversity, and any failure in the distribution system would likely result in a loss of electrical service to the load. Large-scale system maintenance cannot be performed without interruption to the load. Mechanical and vital house loads would be supplied by non-redundant power systems.
  • Multiple path, with redundant components and dual source critical power — Multiple Source/“N” Rated Single or Multi-Module System/Dual or Multiple Path (Concurrently Maintainable and Operable). This is equivalent to the Tier III rating.
    This system possesses redundancy in the power paths to the critical load. The individual critical power systems are rated for a portion of the total load, with a common and centralized dedicated UPS system providing the “redundant” supply to the “line” systems. The “redundant” system, similar to the “line” systems, may possess either a single or multiple modules.
    The concurrently and maintainable system provides load source selection either via static transfer switches or by the internal power supplies in the IT systems themselves. There would be no single points of failure in either the critical power system or the power systems supporting the mechanical or vital house/support loads.
    The concurrently and maintainable type of system allows for complete maintenance during normal operations, but loses redundancy or reduces redundancy to “N” when a given system or power path fails or for selected failure or maintenance modes of operations.
  • Multiple path, with redundant components and systems and dual source critical power — Multiple Source or Sources/Multiple “N”- or Better than “N”-Rated Single or Multi Module System/Dual or Multiple Path (Fault-Tolerant). This is equivalent to the Tier IV rating.
    This system possesses redundancy in the power paths and there may be more than two independent sources of UPS power to the critical load. The individual critical power systems are rated for the complete load for the 2N/System-Plus-System option or multiple UPS systems for larger loads, where the system diversity is undertaken solely by the connection of the critical loads to the various/numerous UPS systems. The UPS system would consist of paralleled UPS module or single/high-kW rotary UPS systems.
    The Fault Tolerant system provides load source selection either via static transfer switches or by the internal power supplies in the IT systems themselves. There would be no single points of failure in either the critical power system or the power systems supporting the mechanical or vital house/support loads.
    The Fault-Tolerant type of system allows for complete maintenance during normal operations, and does not lose redundancy during either failure or maintenance modes of operations.

Like all business initiatives, the complex mission-critical operations reflects the risk management model and the risk tolerance or acknowledgement of a business’ management team.

What Do These Buildings Typically Look Like And What Goes Into Them?

First of all, they look like most other buildings in a company’s real estate portfolio. Some reside as an interior tenant space within an existing building or on a corporate campus. Some clients “bunker” their data centers, where the site and building are highly secured and, in some cases, blast-, attack-, and storm-resistant. By all measures, these sites are typically dominated by the electrical and network trades.

One of the main challenges to mission-critical spaces is that the technology that resides in them will be changed out several, if not dozens, of times before the building itself retires. In many ways, a data center is nothing more than a factory that makes information. Similarly, the costs to relocate or establish a data center in a new area are considerable, when one accounts for the network, staff, and operating costs of these facilities. So, the built environment must offer flexibility and relevance for many, many years.

For the most part, mission-critical facilities share a common requirement that they must operate with a high degree of survivability and for an extended period of time. Aside from the survivability of the physical building, the operations must be able to operate “off the grid” for an extended period of time (the present standards state 72 hours of continuous operation at full load without a resupply of fuel or water for a Tier IV facility, less for others). This is coupled with utility systems that could be redundant, concurrently maintainable and operable or, on the highest Tier, fault tolerant.

Most people tend to confuse the power and cooling requirements, typically known as power density, expressed on a watts-per-square foot basis for the actual data center floor space, with the Tier rating. These two metrics have nothing to do with each other. Power density speaks to the IT equipment’s constitution and the resulting power and cooling draw required to safely support it. The Tier rating reflects the redundancy of the power and cooling systems themselves, with the power density reflecting how much power or cooling that is actually required in a central utility plant.

Because technology is a tenant that is the most mercurial of any industrial process, flexibility for future expansion, both for systems and space, is vital to avoid overbuilding at the onset of the building’s life and to keep the mechanical and electrical systems operating at peak efficiency.

What Technology Trends Are Driving Changes?

Dozens of changes affect the data center. New technology typically offers more computing or storage power in an equivalent space. This is known in the IT world as compaction. Unfortunately, these gains are typically trumped by business’s relentless need for more computing power. So IT professionals simply add more systems into the same space. In the end, power and cooling loads increase, sometimes dramatically.

Aside from the natural and inevitable compaction and churn of technology, these are a few more issues that are leading change in the data center:

  • Network and applications convergence.
    Applications are now being hosted and run over the network. Voice over IP telephony (VoIP) is the most common application, and the network manufacturers such as Cisco and Nortel are the main drivers. This is one of the fastest-growing sectors in technology.
  • High-density computing environments.
    Much has been said about this and in the commercial world you’ve certainly heard the terminology – blades, pizza boxes, enterprise servers. Today, we are facing power densities in excess of 600 W/SF. To put it in perspective, a blade cabinet, running at 80 percent of capacity (a common IT “redline” for computing systems) consumes as much power as two household ranges, one stacked on top of the other in a three-foot by three-foot space. Blades and high-density computing systems run many external, customer-facing applications and high-power engineering computation systems in business today.
    High-density computing systems offer the most difficult cooling challenges for any data center, and the key to high density computing appears to be keen control of the supply and return cooling air flow in the data center.
  • Creeping reliability demands.
    You wake up one day to find out that you have spaces and systems that now require 24/7 utility support. This could manifest itself as a vital corporate network node residing in an office building without sufficient power or cooling backup.
  • Grid computing.
    Grid computing is the commercialization of captive computing platforms. Grid computing started many years ago, when the U.S. Government would sell supercomputer time to private industry that had a need for occasional and massive processing power, but did not want to spend the money for its own system. This typically fell into the aerospace, geology, or chemical businesses. Today, grid computing is sold by universities to private industry or shared amongst non-profits. Businesses such as IBM, HP, and Sun also offer grid computing where server farms are rented by the calculation cycle, where 10,000 servers may be harnessed together into a single processing “engine” to deliver a massive computation in a fraction of the time. Grid computing also tends to use high-density computing systems.

What Are the Cost and Schedule Models?

Data centers are infrastructure-intensive undertakings, coupled with some fairly robust base building construction. Cost is related to the size and complexity of mechanical and electrical systems and the overall size of the building. In some cases, a modest Tier II facility might be $250/s.f. (location neutral), while a high-power density (such as +300 W/s.f.), Tier IV facility might be in excess of $2,000/s.f.

Per-square foot costs are typically rendered as the cost per square foot of the raised floor and the building’s net rentable area. While several factors affect the cost of any mission-critical facility, the size of the mechanical and electrical systems possess the greatest heft. By all measures, the MEP systems account for between 50 and 70 percent of the final construction cost of the project.

In the past, the cost metrics simply did not vary significantly, since the technology that was installed in the building stayed and did not vary from site to site or user to user. Furthermore, high-density computing systems had not been mainstreamed in most corporate IT systems until 1999.

However, with the higher density applications (facilities with a power density of over 150 W/SF), where there is a huge amount of power and cooling being applied to a relatively small area, a $$/kW and $$/ton model works better.

Today, costs are better ascertained by breaking the project into its salient components (based on a neutral location).

Tier ratings for the higher density models are less relevant, since you simply have a huge equipment census that make topology less relevant versus the installed system capacity.

So, the cost of your facility is related to:

  • Overall size.
  •  Security and hardening requirements.
  • Tier rating.
  • Power density.
  • Location and work conditions.
  • Project delivery schedule.


Rosendin Electric is consistently ranked in the top ten electrical contractors in the United States, with over $275M in revenue in 2004 and over 80 years of operation. They are one of the largest employee-owned companies in the United States, and Rosendin Electric routinely delivers projects in the high technology, life sciences, health care, entertainment, education, heavy industrial, residential and power generation industries.

William P. Mazzetti, Jr., PE, is the Chief Engineer and Director of Engineering Services for Rosendin Electric, a full-service electrical contracting firm, practicing throughout the United States. Mr. Mazzetti has spent the past 20 years involved in the design and construction of complex commercial, institutional and industrial buildings. His focus for the last fifteen years has been in the conception, design, implementation, testing and operation of mission-critical facilities. To date, Mr. Mazzetti has successfully completed over 300 mission-critical projects. A registered engineer in 14 states, Mr. Mazzetti is also an IEEE Life Member and holds a National Council of Engineering Examiners certification.

Mr. Mazzetti is routinely published in such industry circulars as Consulting-Specifying Engineer, Data Center Manager and Building Operations and Maintenance, and is a frequent speaker at regional and national conferences, such as the 7x24 Exchange and AFCOM Data World. Bill has been an author and reviewer for both of the mission-critical industry’s standards, the EIA/TIA 942 and the new BICSI Data Center Standard. He may be reached at bmazzetti@ rosendin.com.

 
1331 Pennsylvania Avenue, NW, 4th Floor, Washington, DC 20004
Phone 202-682-0110 - Toll Free 866-692-0110 - Fax 202-682-5877