Using Failure History and MTBF to Right-Size Your PM Intervals
Why the OEM Interval Is a Starting Point, Not the Answer
The manual says change the gearbox oil every 2,000 hours. You've been doing it every 2,000 hours for four years. Last spring the gearbox still seized — at 1,740 hours — and took 11 hours to rebuild.
OEM intervals are written for average duty cycles in average environments under average loads. Your plant is not average. Your coolant might run hotter, your press brake might cycle faster, or your conveyor sees abrasive fines the OEM didn't model. When the interval and the real failure pattern don't match, you're either over-maintaining (replacing parts that had life left) or under-maintaining (arriving at failures you could have prevented).
The fix is to use your own failure history — the only data that actually reflects your conditions — to calculate a MTBF-derived interval and then set your PM trigger at a defensible fraction of that number. By the end of this article you'll have the formula, a worked example, and a method for deciding where to place your PM trigger relative to MTBF.
Always confirm resulting intervals against the equipment's OEM documentation and any applicable standards (ASHRAE, NFPA 70B, OSHA, and ISO 55000 as the asset-management framework, among others) before locking them into a schedule. MTBF-derived intervals are a data-informed starting point; specific requirements vary by equipment, duty cycle, industry, and jurisdiction.
Step 1 — Build a Usable Failure Log
You cannot calculate a meaningful MTBF without a failure log. A failure log does not have to be sophisticated — a spreadsheet row per failure event with five fields is enough to start:
| Field | What to record |
|---|---|
| Asset ID | Unique identifier for the machine |
| Failure date | The date (or shift) the failure occurred |
| Return-to-service date | When the asset was running again |
| Failure mode | One sentence on what broke and why (if known) |
| Operating hours at failure | Meter reading or estimated hours since last reset |
If you don't have operating hours, use calendar days as a proxy — imperfect, but better than nothing. Three to five failure events per asset gives you a usable MTBF baseline; more events narrow the uncertainty.
Two practical traps to avoid:
- Include only functional failures, not planned PM tasks. A PM task is not a failure. Counting PMs as failures inflates your failure count and produces an artificially low — and too-conservative — MTBF.
- Keep failure modes separate. A bearing failure and a seal leak on the same machine are two independent failure modes. Aggregate them and your MTBF is meaningless for interval-setting purposes. Calculate one MTBF per failure mode per asset.
A persistent PM history log is the most straightforward way to build this record systematically rather than reconstructing it from memory after the fact.
Step 2 — Calculate MTBF from Your Data
MTBF = Total operating time ÷ Number of failures
"Total operating time" is the cumulative time the asset was available and running across the observation window — not calendar time, not downtime. If an asset ran 4,800 hours over two years and experienced 4 bearing failures in that window:
MTBF = 4,800 hours ÷ 4 failures = 1,200 hours per failure
That single number tells you that, on average, this bearing fails every 1,200 operating hours under your conditions. The OEM might say 2,000 hours. Your data says 1,200. Those two figures are not in conflict — they're measuring different things. The OEM is quoting a design target for rated conditions; your MTBF is measuring actual performance in your environment.
A deeper walkthrough of the MTBF formula, including how to handle multi-asset fleets and partial observation windows, is in How to Calculate MTBF. If you want a pre-built workbook that calculates MTBF, MTTR, and OEE from your log entries, the MTBF/MTTR/OEE Calculator Workbook handles the arithmetic and formats the outputs for a maintenance review.
Step 3 — Place Your PM Trigger at the Right Fraction of MTBF
Knowing your MTBF is not the same as knowing your PM interval. If you set the PM interval exactly equal to MTBF, you're scheduling maintenance at the average time of failure — meaning roughly half of your assets will fail before the PM arrives. That's not preventive maintenance; that's a coin flip.
The standard practice is to set your PM trigger at a conservative fraction of MTBF:
- 50% of MTBF — high-criticality assets, expensive failures, safety implications, or high process variability. Expensive in PM frequency but provides the widest buffer.
- 60%–70% of MTBF — a common middle ground for most production-critical assets. Enough buffer against early failures while not over-maintaining.
- 80% of MTBF — lower-criticality assets, failures with low consequence, or failure modes with a reliable early-warning signal (temperature, vibration, noise) that can catch drift before the PM arrives.
Continuing the worked example above (MTBF = 1,200 hours):
| Trigger fraction | Resulting PM interval |
|---|---|
| 50% | 600 hours |
| 65% | 780 hours |
| 80% | 960 hours |
If this bearing is on a production-critical press and an unplanned failure costs four hours of line downtime, a 65% trigger (780 hours) is a reasonable starting point. If the asset is a secondary conveyor with a hot-standby spare, 80% (960 hours) may be appropriate.
There is no universal "correct" fraction — it is a judgment call that weighs criticality, failure consequence, spare availability, and labor cost. What matters is that the fraction is explicit and documented, not intuitive and forgotten.
For assets where the MTBF-derived interval is materially shorter than the OEM interval, you have a data-informed case to override the OEM recommendation. The PM Interval Override guide covers how to document that decision and what conditions should prompt a re-evaluation.
Step 4 — Account for Premature Replacement Cost
Setting the PM trigger too conservatively (say, 40% of MTBF on a low-criticality asset) has a real cost: you're discarding parts that have usable life remaining. This is particularly visible for consumables — filters, belts, seals, bearings — where the part cost is non-trivial and the replacement happens at every PM.
A simple per-PM cost model:
Per-PM cost = (Labor hours × Labor rate) + Parts cost
Illustrative example: A PM task takes 1.5 hours. Using the BLS median for machinery maintenance workers of $27.57/hr (BLS OEWS, May 2023, SOC 49-9043) as a reference rate — though your actual rate will vary and should be entered from your own records — parts run $85 per event.
Per-PM cost = (1.5 hrs × $27.57/hr) + $85 = $41.36 + $85 = $126.36 per PM
At a 600-hour interval on an asset running 4,800 hours/year, that's 8 PM events/year → $1,010.88/year in PM cost for this one task.
At a 780-hour interval: 6.15 events → approximately $777/year.
The difference — roughly $234/year on a single task — may seem small in isolation. Across a fleet of 40 assets with similar tasks, the aggregate impact of interval calibration becomes material. The cost of premature part replacement article walks through how to model this across a fleet.
The point is not to chase the longest possible interval. It's to make the cost of each interval choice visible so the decision is deliberate.
Step 5 — Revise Intervals as the Failure History Grows
MTBF is not a set-it-and-forget-it number. It should update as your failure log grows.
Two practical triggers for a scheduled MTBF re-evaluation:
- A new failure event. Each failure adds a data point. Recalculate MTBF after every event — especially if the new failure occurred well outside the expected window (suggesting conditions changed) or well inside it (suggesting the interval is already too long).
- A process or environmental change. New product mix, higher shift utilization, different ambient temperature, supplier change on a consumable — any of these can shift your real failure rate away from the historical baseline.
A living failure log turns interval-setting from a one-time OEM lookup into an iterative, data-driven PM interval process. That's the core discipline of reliability-based maintenance interval management: the interval is a hypothesis, the failure history is the test, and periodic recalculation is the update cycle.
The Preventive Maintenance Interval and Cost Guide covers the broader interval-setting framework, including how to handle assets where failure history is thin and OEM data remains the primary reference.
Putting It Together: A Simple MTBF-to-Interval Workflow
- Log every functional failure — date, asset, failure mode, operating hours.
- Calculate MTBF per failure mode per asset — total operating hours ÷ number of failures.
- Choose a trigger fraction based on criticality and failure consequence (50%–80% of MTBF).
- Model the per-PM cost at that interval — labor + parts — and compare it against the estimated cost of an unplanned failure for the same mode.
- Set the interval, document the reasoning, and flag it for re-evaluation after the next failure event or any significant process change.
- Cross-check against OEM documentation and applicable standards before finalizing. MTBF-derived intervals inform the decision; the OEM manual and relevant standards (and, where applicable, a qualified engineer or compliance authority) have the final word.
From Spreadsheet to Persistent Schedule
A failure log in a spreadsheet and a MTBF calculation in a free calculator widget will get you through steps 1–3 above. Where the approach breaks down is at scale: past roughly ten assets, cross-referencing failure logs, recalculating MTBF, updating PM due-dates, and tracking fleet-level annual PM cost across a spreadsheet becomes fragile — version drift, manual errors, and intervals that go stale between reviews.
A persistent, multi-asset calculation engine keeps the failure-derived interval, the PM due-date, and the projected annual cost for each asset in one place — recalculating automatically as inputs change, rolling up to a fleet-level cost view, and flagging assets whose next PM is overdue. That's the gap between a one-time calculator widget and a tool that does the math continuously across a fleet.
If you're ready to move the calculation out of the spreadsheet, the 14-day free trial lets you build your asset registry, enter MTBF-derived intervals, and see the fleet-level annual cost projection before committing to a plan. If you'd rather stay in Excel for now, the MTBF/MTTR/OEE Calculator Workbook is a structured starting point for the calculations in steps 1 and 2 — with the MTBF, MTTR, and OEE formulas pre-built and ready for your failure-log data.
Either way, the interval should come from your data — not the manual's default.
Get maintenance guides in your inbox
Related guides
Reliability MetricsLeading vs Lagging Maintenance KPIs: What to Track and Why
PM compliance is a leading indicator; downtime is a lagging one. Here's how to balance the KPIs that predict and the ones that report.
Reliability MetricsMaintenance Calculators and Formulas Hub: PM Intervals, Cost, and Reliability
The single reference for every calculation a maintenance manager needs — interval, cost, and reliability formulas, each with a worked guide.
Reliability MetricsEquipment Availability Explained: How MTBF and MTTR Combine
Availability = MTBF / (MTBF + MTTR). Here's how the two metrics combine into the single number that drives uptime.