Today, ICT equipment is typically involved in e-commerce, market transactions, financial settlement processes or other activities with quality-of-service commitments. In such situations, loss of its availability is intolerable – and as ICT equipment is entirely dependent on its power supply, UPS availability also becomes mission critical. With this in mind, this article looks at what availability actually means and how best to achieve it. We also see why increasing reliability, although important, is only one step in achieving the availability that’s required.
Availability and reliability
Data centre operators care primarily about availability because it is a measure of how much time per year their ICT resource is operational and available. It is formally defined as
Availability = MTBF / MTBF + MTTR
where MTBF = Mean Time Between Failures and MTTR = Mean Time To Repair. This equation shows that we can increase availability by reducing MTTR as well as by increasing MTBF, and we will see that the best results come from employing both of these strategies.
MTBF is ultimately based on reliability, so we should start by increasing this. A UPS system’s reliability is the probability that it can perform its designed function of supplying uninterrupted, clean power over a given time period. This reliability is driven by the quality of the components used, and improves with better-quality, more expensive devices. However, as cost is increased, it reaches a plateau where further spending is no longer rewarded by further reliability – even the best components reach a limit of improvement. At this point we need another tool to drive further increase in MTBF.
5Fault tolerance and availability
The answer is to build a fault-tolerant system; one that will continue to deliver uninterrupted power to its critical load even if one component fails. Fault tolerance can be achieved using redundant configurations. Imagine for example a 120 kVA load served by two free-standing UPS units, each of 120 kVA capacity. Either unit can continue to fully support the load if the other fails; through such fault tolerance, the MTBF of the total UPS installation is significantly better than that of a single unit entirely dependent on the reliability of its own components. While a single UPS unit might achieve an MTBF of 50,000 hours to 200,000 hours, a fault tolerant redundant system could achieve 1,250,000hours, depending on its configuration. This effect is shown in Fig 1.
Fig 1: Effect of component quality and redundancy on UPS MTBF
Such configurations are generically known as N+n redundant systems, where N (Need) is the number of UPS units essential to support the critical load, and n is the number of redundant units. Accordingly, our example comprises a 1+1 redundant configuration. Although, as we have shown, this improves MTBF and therefore availability, it’s not the best possible solution in terms of efficiency and cost. Data centre managers are constantly under pressure to extract the best possible power availability from the least possible budget and floor space, and with the UPS technology now available we can take more steps to help them.
Firstly, consider our 1+1 redundant configuration; by definition it can never be more than 50% loaded. This is highly inefficient in both capital cost and operating cost terms. A better solution is to configure a 4+1 system which can run at up to 80% loading. Increasing the load like this can improve the UPS’s efficiency and reduce running costs, while capital expenditure is reduced as less excess capacity is being purchased. For our 120 kVA example, we could achieve a 4+1 configuration using five free standing 30 kVA units, any four of which could deliver 120 kVA if one unit fails. In this scenario, the 4+1 configuration does have one disadvantage compared with its 1+1 alternative; as it has more components, its MTBF is reduced from 1,250,000 to 5000,000hours. We have therefore improved efficiency at the cost of reduced MTBF and availability although this effect can be addressed by reducing MTTR.
Hot swappability and reduced MTTR
We can optimise our efficiency level, improve availability and reduce our floor space requirement as well by turning to modern, modular UPS technology. By using solid-state IGBT devices, UPSs can dispense with output transformers. The extent of weight and space this saves is so significant that a transformerless 30 kVA UPS unit can be implemented as a slide-in rack module rather than as a free standing floor unit. We can now build our 120 kVA 4+1 redundant configuration vertically as five modules within a single 19” frame occupying minimal floor space.
However this rackmount modular approach also offers more significant advantages because the modules can be ‘hot swapped’, or removed and replaced without taking the system off line. This reduces the MTTR to around half an hour, compared with the six hours typically needed for free standing unit repair. This has an important impact on availability which, as we showed earlier, can be improved by reducing MTTR as well as by increasing MTBF. Table 1 below illustrates the effect of these factors – ‘1+1’ vs ‘4+1’ redundancy, and hot swap modularity – on UPS availability. The availability figures have been obtained by using the equation as described below, and expressing the results as a percentage.
Availability = MTBF / MTBF + MTTR
Table 1: Effects of redundancy and modularity on availability
|Free Standing System, no Redundancy||1+1 Redundant Free Standing System||4+1 Redundant Free Standing System||4+1 Redundant Rack Mount Modular System|
|MTBF Hrs||Up to 200,000||Up to 1,250,000||500,000||500,000|
From Table 1 we can see that redundancy with its fault tolerance improves availability. A 4+1 redundant system has less availability than a 1+1 configuration, but as we have shown, it is more energy-efficient. However, managers of data centres and other mission-critical ICT installations can obtain the best possible power protection by choosing a ‘hot swap’ rack mount configuration such as the 4+1 example in the Table. It benefits significantly from its reduced MTTR, and offers the best efficiency from the least floor space, and at 99.9999%, the best availability. This figure is sometimes referred to as ‘six nines’ – an industry-accepted way of expressing high availability.