MTBF, MTTR, and OEE Explained: The Three Numbers Every Maintenance Manager Tracks
Why These Three Numbers Show Up on Every Maintenance Dashboard
It is the kind of Tuesday nobody plans for. A CNC lathe or a packaging line goes down mid-shift, and by the time the technician has diagnosed the fault, sourced the part, and restarted the machine, four hours are gone. The shift supervisor wants to know whether this is unusual. The plant manager wants to know what it costs. The maintenance manager wants to know whether the PM interval is set right — or whether the next failure is already overdue on two other assets.
Three metrics answer those questions in a common language that travels from the shop floor to the budget meeting: MTBF (mean time between failures), MTTR (mean time to repair), and OEE (overall equipment effectiveness). Together they form a short diagnostic loop — how often does the equipment fail, how long does it take to recover, and what is the net production impact. Get comfortable with all three and you can set PM intervals with a rational basis, defend a maintenance budget with numbers, and spot an asset that is quietly dragging down throughput before it becomes a crisis.
This guide defines each metric plainly, shows the formula and a worked example, explains what a good number looks like versus a warning sign, and connects all three back to how you set PM intervals and forecast maintenance cost.
MTBF: Mean Time Between Failures
MTBF is the average operating time an asset delivers between one failure and the next. It is expressed in hours (or days or cycles, depending on your equipment) and is the closest thing maintenance has to a reliability score for a single asset.
The formula
MTBF = Total operating time ÷ Number of failures
"Total operating time" is the time the asset was actually running — scheduled downtime, planned maintenance windows, and idle time are typically excluded, though your facility may define this differently. "Number of failures" is the count of unplanned stops requiring corrective action over that same period.
Worked example
Illustrative inputs: A press brake ran for 4,200 operating hours over the past 12 months and experienced 7 unplanned failures.
MTBF = 4,200 hr ÷ 7 failures = 600 hr per failure
That 600-hour figure now has a practical use: if you have been setting the PM interval on that press brake at 500 hours, you are doing preventive work before the historical failure point — which is roughly where an interval should sit. If the interval were set at 700 hours, you would be scheduling PM after the average failure event, which means you are effectively running to failure while calling it preventive maintenance.
MTBF is also the metric that makes PM interval decisions defensible. When someone asks why you changed the oil at 480 hours instead of waiting for the 600-hour OEM recommendation, you can show them the asset's own failure history and explain that its operating conditions (duty cycle, environment, load profile) are producing failures earlier than the OEM's general guidance anticipated. Always confirm any interval adjustment against the OEM manual and the relevant standards for your equipment type — MTBF gives you the conversation starter, not the final word.
For a full step-by-step walkthrough of data collection, edge cases (assets with zero failures, newly installed equipment), and how to calculate a fleet-level reliability profile, see How to Calculate MTBF.
MTTR: Mean Time to Repair
MTTR is the average time it takes to restore an asset to operational condition after a failure. Where MTBF measures how reliable the equipment is, MTTR measures how responsive and capable your maintenance operation is when reliability fails.
The formula
MTTR = Total repair time ÷ Number of repairs
"Total repair time" runs from the moment the failure is confirmed to the moment the asset is back in production — it includes diagnosis, waiting for parts, hands-on repair, and any required testing or sign-off before restart. Some facilities exclude parts-wait time from MTTR and track it separately; be consistent whichever convention you choose.
Worked example
Illustrative inputs: Over the same 12-month period, those 7 press brake failures required the following repair durations (in hours): 3, 5, 2, 8, 4, 6, 3.
Total repair time = 3 + 5 + 2 + 8 + 4 + 6 + 3 = 31 hr
MTTR = 31 hr ÷ 7 repairs = 4.4 hr per repair
A 4.4-hour average repair time is a baseline. Whether it is good or poor depends on the asset type, the failure mode, and your parts strategy — MTTR does not have a universal world-class threshold the way OEE does. What matters is the trend: if MTTR on this asset climbs from 4.4 hours to 7 hours over the next two quarters, something has changed — parts are harder to source, the failure modes are becoming more complex, or the technician knowledge for this machine is walking out the door.
Research from ABB's Value of Reliability survey (2023, Sapio Research) pegged the typical cost of unplanned downtime at around $125,000 per hour, with two-thirds of industrial facilities reporting at least one significant downtime event per month. Even if your facility's downtime cost is a fraction of that figure, the arithmetic of MTTR × downtime cost per hour is usually the fastest way to justify spare-parts stocking, cross-training, or a revised PM interval.
Reducing MTTR comes from two directions: faster diagnosis (better documentation, visual management, troubleshooting guides at the machine) and faster parts availability. Research from Oxmaint (2026) found that 20%–30% of total downtime duration is tied directly to parts availability — meaning that nearly a quarter to a third of the repair clock is often just waiting, not working.
For a detailed breakdown of MTTR calculation, common definitional pitfalls, and how to use MTTR as a staffing and spares justification tool, see How to Calculate MTTR.
OEE: Overall Equipment Effectiveness
OEE is the single number that combines how often equipment is available, how fast it runs when available, and how much of what it produces meets quality standards. It is the most comprehensive of the three metrics and the one most likely to land in a leadership report.
The formula
OEE = Availability × Performance × Quality
Each component is a percentage (expressed as a decimal in the multiplication):
- Availability = (Planned production time − Unplanned downtime) ÷ Planned production time
- Performance = (Actual output × Ideal cycle time) ÷ Planned production time
- Quality = Good units produced ÷ Total units produced
Worked example
Illustrative inputs: A fabrication cell has 480 minutes of planned production time in a shift. It lost 60 minutes to unplanned downtime (Availability = 420 ÷ 480 = 87.5%). Running time produced 380 units against an ideal rate of 1 unit/minute (Performance = 380 ÷ 420 = 90.5%). Of 380 units, 370 passed quality inspection (Quality = 370 ÷ 380 = 97.4%).
OEE = 0.875 × 0.905 × 0.974 = 0.771 → 77.1%
What the number means
OEE benchmarks from the research literature give this context:
- 85% is the recognized world-class threshold (Availability ≥ 90%, Performance ≥ 95%, Quality ≥ 99.9%), derived from Seiichi Nakajima's Total Productive Maintenance framework and cited by Tractian (2026).
- ≈60% is the cross-industry average, corroborated by multiple sources including InfluxData (2024) and LeanProduction/Fabrico (2024).
- 66.8% is the discrete-manufacturing average in Godlan's 2025 benchmarking data, with a high of 78.2% (medical devices) and a low of 57.2% (trailers and RVs).
The fabrication cell's 77.1% in the example above sits above the industry average and approaching world-class — but the Availability component at 87.5% is still below the 90% world-class threshold, which points directly back to the press brake's MTBF and MTTR profile. That is the loop: a low Availability score means either failures happen too often (MTBF too low), take too long to fix (MTTR too high), or both.
Facilities that manage OEE as a primary KPI see measurable operational gains. McKinsey research (via Cryotos, 2026) found that facilities actively managing OEE achieved up to 25% lower maintenance cost and 10%–20% higher throughput within 18 months. Frame that as a model on your own numbers: if your annual maintenance budget is $800,000 and your OEE is sitting at 62%, a disciplined improvement program that tracks Availability, Performance, and Quality separately — and links each back to the underlying reliability metrics — could shift the economics materially. The arithmetic is worth running with your own inputs.
For the full OEE calculation walkthrough, including how to separate the Six Big Losses and how to use OEE to prioritize which assets get PM attention first, see How to Calculate OEE.
For the specific derivation of the Availability component and how it connects to equipment uptime calculations, see Availability Calculation Explained.
How MTBF, MTTR, and OEE Connect to PM Intervals and Maintenance Cost
These three metrics are not just dashboard decorations. Each one feeds directly into decisions that show up in your PM schedule and your maintenance budget.
MTBF → PM interval calibration
The most direct application of MTBF is setting or validating a PM interval. The general principle: schedule preventive maintenance before the average failure point, with enough margin to account for the variability in your failure distribution. If MTBF is 600 hours, a PM interval at 480–520 hours gives you a reasonable buffer — you are intervening while the asset is still statistically likely to be running, rather than after it has already failed.
OEM manuals are the starting point and should always be confirmed before adjusting an interval — manufacturer guidance reflects design intent and often dictates warranty and compliance requirements. MTBF from your own operational history is the validation layer: it tells you whether the OEM's general interval fits your specific duty cycle, environment, and load. They should agree reasonably well. When they diverge significantly, the discrepancy is worth investigating before changing the interval.
MTTR → staffing, spares, and cost-per-incident inputs
MTTR translates directly into repair labor cost:
Illustrative model: MTTR of 4.4 hours × 2 technicians × labor rate of $27.57/hr (BLS OES, Maintenance Workers, Machinery, SOC 49-9043, May 2023) = $242.60 in labor per incident, before parts. Multiply by 7 incidents/year = $1,698 in repair labor annually for this one asset. Add parts costs and you have the corrective-maintenance line for that asset in your annual budget.
That per-asset figure, summed across the fleet, is the reactive-maintenance component of your total maintenance cost. The ratio of reactive spend to planned PM spend is one of the clearest indicators of program maturity: research cited by MapTrack (2026) found that operations without a digital maintenance system averaged 40%–55% of maintenance activity as reactive, compared with 15%–20% at operations running a maintenance management system. The DOE (via eWorkOrders, 2026) puts the cost differential plainly: reactive maintenance typically costs three to five times more than the same work planned, when all hidden costs are counted.
OEE → the cost case for PM investment
OEE turns reliability data into a production and cost argument that plant managers and CFOs can engage with directly. When Availability drops below 90%, the OEE math shows exactly how many units of throughput the gap is costing. When you can show that a $4,000 PM program on a critical asset — timed to a defensible MTBF-based interval — prevents two failures per year, and each failure costs 4.4 hours of downtime at whatever your facility's cost-per-hour actually is, the investment justification writes itself.
For a structured framework connecting PM intervals to annual cost forecasting across a multi-asset fleet, see the Preventive Maintenance Interval and Cost Guide.
Putting the Three Metrics Together: A Diagnostic Loop
Think of MTBF, MTTR, and OEE as a short feedback cycle, not three independent calculations:
- MTBF tells you how often a failure is interrupting production. If it is shorter than expected, something in the operating environment or the PM interval is off.
- MTTR tells you how long each interruption lasts. If it is longer than it should be, the problem is in your response system — parts, documentation, skill, or process.
- OEE integrates both into a single production effectiveness score. A low OEE score points you back to which component — Availability, Performance, or Quality — needs attention, which then points you back to MTBF or MTTR for the root cause.
Run this loop quarterly on your critical assets. A consistent MTBF trend that shortens over time on a given machine is a signal that the PM interval or the PM task itself needs revision. A consistent MTTR trend that lengthens is a signal about your maintenance response capability, not about the equipment itself.
The challenge at ten assets is keeping all of this in a spreadsheet. At twenty or thirty assets, it becomes genuinely difficult: the failure log lives in one tab, the PM schedule in another, the cost estimates in a third, and the last person who understood how the formulas connected left six months ago. That is the point at which a persistent, multi-asset calculation engine — one that holds your asset registry, calculates PM due-dates from your MTBF-informed intervals, and rolls up annual cost estimates across the fleet — does work that Excel cannot reliably sustain.
You can explore the full set of maintenance calculators, frameworks, and tools at the Maintenance Calculators Hub.
Common Mistakes When Tracking These Metrics
Mixing up operating time and calendar time in MTBF. A machine that runs one shift per day has roughly 2,000 operating hours per year, not 8,760 calendar hours. Using calendar time inflates MTBF and makes the asset look more reliable than it is. Be explicit about your denominator.
Including planned downtime in MTTR. A scheduled 8-hour PM window is not a repair event. If you count it in MTTR, you inflate the metric and make your corrective response look slower than it is. Track planned and unplanned downtime in separate columns from day one.
Treating OEE as a single-asset metric only. OEE is most actionable when it is tracked per asset and per shift, so you can separate equipment behavior from operator or product-mix effects. A machine that scores 82% OEE on the day shift and 61% on the night shift is telling you something about more than the equipment.
Setting PM intervals from MTBF without accounting for failure distribution. MTBF is a mean, not a minimum. If your failure times are widely scattered (some at 300 hours, some at 900 hours), an interval set at 80% of MTBF may still miss a meaningful fraction of failure events. Tracking the range of failure times — not just the average — gives you a better interval target. This is the first step toward weibull-based reliability analysis, though for most SMB operations, a well-tracked MTBF with a conservative margin is already a significant improvement over tribal knowledge.
Leaving these metrics in a spreadsheet that only one person understands. The value of MTBF, MTTR, and OEE is cumulative — they become more useful the longer and more consistently you track them. A spreadsheet that breaks when someone adds a column, or whose formulas nobody else can audit, is a single-point-of-failure for your reliability program.
Start Calculating With Your Own Data
MTBF, MTTR, and OEE are straightforward to calculate once you have consistent failure and repair records. The difficulty is not the math — it is maintaining the discipline of logging failure times, repair durations, and production counts with enough consistency to make the averages trustworthy.
If you want a structured starting point, the MTBF/MTTR/OEE Calculator Workbook gives you pre-built formulas for all three metrics, a failure log template, and an OEE breakdown by the six major loss categories — ready to populate with your own asset data. It is designed for the maintenance manager who wants to start tracking these numbers this week, without building the spreadsheet architecture from scratch.
Once you have a baseline MTBF for each critical asset, you have the core input for a rational PM interval. And once you have PM intervals on paper, the next step is projecting what those intervals cost across the fleet — in labor hours, parts, and annual budget. The Preventive Maintenance Interval and Cost Guide walks through that translation.
Get maintenance guides in your inbox
Related guides
Reliability MetricsLeading vs Lagging Maintenance KPIs: What to Track and Why
PM compliance is a leading indicator; downtime is a lagging one. Here's how to balance the KPIs that predict and the ones that report.
Reliability MetricsMaintenance Calculators and Formulas Hub: PM Intervals, Cost, and Reliability
The single reference for every calculation a maintenance manager needs — interval, cost, and reliability formulas, each with a worked guide.
Reliability MetricsUsing Failure History and MTBF to Right-Size Your PM Intervals
Your own failure history is the best interval-setting data you have. Here's how to turn MTBF into a smarter PM schedule.