Easy Lessons to Master Queuing Theory
Easy Lessons to Master Queuing Theory
This article will elevate someone who has no knowledge of queuing theory to a level of comprehension that surpasses the highest paid call center consultants. The next time you read a book on call center planning or attend a seminar on the same topic, you will already know more than the instructor.
The very first thing you need to know about queuing theory is that it actually has nothing to do with queuing. Really. That’s true.
Queuing theory assumes each interval starts with an empty queue and ends with every caller hanging up. This is a profoundly false assumption — every interval of every day. In reality queues flow across intervals. Callers waiting in the queue at the end of one interval continue to wait. Everyone who arrives in the next interval will wait behind them. These are the most basic principles of a Queue.
So how did it come to be that Queuing Theory is completely void of any basic understanding of queuing? It’s an important question because every call center and every WFM solution has relied on the same absence of queuing knowledge to perform forecasting and scheduling. It’s been like that for the last 30 years. However, the root of the problem is much older – dating back over 100-years.
Queuing theory was conceived at a time when data collection tools were mechanical and rudimentary. Banks could count the transactions that each teller processed. The tracking was recorded with pen ticks on a piece of paper. The teller or their supervisor would add a tick to a piece of paper after each transaction. The ticks were totaled at the end of the hour.
Agner Erlang worked for the Copenhagen Telephone Company between 1908 and his death 20 years later. He worked during a time when most telephone switching was performed by peg board operators. A caller would pick up their phone. An operator would answer, listen to the desired phone number then make the connection with a series of wire connections on a peg board.
Erlang was more interested in the automated switching units that had just come on the market. He wanted to help local telephone exchanges make rational decision about how many circuits to purchase for automatic switching units.
Most of Erlang’s data came from automatic switching units. These machines recorded call counts on a dial that looked much like a car odometer. The only data that could be recorded were successful calls that connected immediately. The mechanical switches could neither hold calls in queue, nor record wait times, nor record the number of blocked or abandoned callers. If the switch ran out of circuits, callers experienced dead air and no data was collected on these calls.
Agner Erlang decided to apply queuing theory from the banking industry to telephone calls. In 1909, Erlang published his ideas, written in his native Danish language. The translated title of his article was “Theory of Probability and Telephone Conversation”. Erlang made many detailed observations about the weaknesses of queuing theory. Specifically he noted the method did not work for any one (or more) of the following:
- Small intervals
- long service times
- For event driven activity (non-random arrival patterns)
- When data had been collected from equipment that was already working at capacity
- When long wait times or abandons were a problem during the data collection period
Erlang also noted that interval based forecasting made the very unrealistic assumption that each interval was independent from activity in other intervals.
So if the weaknesses of queuing theory were so large, why was it used?
The answer is simple. One hundred years ago, planners did not have computers. All of their computations had to be performed by hand. The day was divided up into hours. Each hour was called an interval.
Manual calculations were performed on the hourly data. To make calculations manageable they made the assumption that each interval was unaffected by the activity from the intervals that preceded it. Thus the demand forecast for any given interval was purely a function of the average number of events and the average transaction time. Clearly, queuing theory was the ideal technology for a period in history when computers did not exist.
Real modern call centers not only have computers, they also have queues. These are two noteworthy differences between now and 100 years ago. Today, callers dial into a call center. Their call is offered to the queue where they wait until serviced or until they abandon. Any callers who remain in queue at the end of the interval remain in the system. They continue to wait in queue as the next interval starts. Everyone who arrives in the next interval will wait behind them. So activity in one interval does affect the next.
If a call center agent stood up and announced “I can’t wait until the end of the interval so that all of these callers instantly hang-up”, the agents around him would conclude he was remarkably of touch with the concept of a queue. The fictitious agent’s total lack of queuing knowledge has been engrained in call center forecasts from day 1. Whether planners use WFM software or Excel spreadsheets to plan, the resulting forecasts are characterized by a total absence of queuing insight. It’s been like that for more than 30 years
At this point, congratulate yourself. You knowledge has already surpassed all of all the experts that think queuing theory is about queuing.
The Virtual Queue
Now let’s teach you a few more things that the experts never learned. Most of them teach that that Erlang’s method was developed to make sure callers did not wait too long in queue. That’s not true. Erlang never experienced a real queue. The switches of his era had no ability whatsoever to hold a call in queue. He worked with mechanical switches that gave out dead air when the available circuits were occupied.
Erlang was a very clever and creative person. He was the first to think of a rationale to apply queuing theory (from banks that did have queues) to telephone networks (that had no queuing capability). He proposed the following reasoning:
“Callers who encounter dead air are likely to try a few times to try to get through. If they keep trying, they are not in a real queue, but in a sense, they are in a virtual queue. There is no physical limit to the number of callers who can redial into dead air. Therefore, the virtual queue has no size limit. In other words, the virtual queue is infinite in size.”
A lot of experts who think they understand Erlang’s method will say that Erlang assumed that callers would wait infinitely. However the statement is completely wrong. Erlang stated that the size of the queue was infinite. He never stated that callers would wait infinitely. In fact, he said the opposite. He said callers were likely to give up if there was excessive call blocking.
Abandoned Caller Data
Another little known fact is that Erlang never had any abandoned caller data to work with. If a caller gave up on redialing, the switch would not record the transaction. Successful transactions moved a mechanical switch arm. That movement added a call count to something that looks very much like a car odometer. Blocked and abandoned callers had no effect on the switch arm so they were not counted.
This understanding of Erlang’s teachings is very important. There are a lot of experts who will teach you that the Erlang method will over-forecast due to its assumption that no callers abandon.
However, Erlang never had any abandon data to work with. He could not have over-forecasted with data that he never had. Instead, Erlang described the reliability of the raw data as being quite fragile.
With no blocked or abandon data, the planner never knew how much data was being lost. Therefore Erlang stated that the underlying data was only of good quality if it was reasonable to assume that few if any callers were abandoning. That meant assuming that wait times were relatively low.
Erlang’s statements about abandons and wait times are often called assumptions. But they weren’t really assumptions. They were actually data quality parameters.
What he was really saying is that his proposed methods was only reliable if the underlying data had been gathered from a system that had been operating with virtually no wait times and no abandons.
If a busy hour study demonstrated that all ten circuits of a switch were fully utilized then it was very likely that the data had been corrupted by long waits and high abandons. If so, his recommendation was to declare the data and the forecast invalid.
If planners did not perform the utilization test, they had no idea if the data collected was useful or corrupt.
Creating a forecast with corrupt data would yield a false recommendation for the proper number of circuits. The system would never grow to its proper size because the measurement of load was limited by the prevailing capacity.
To paraphrase Erlang’s teaching on wait times and abandons…
“Wait times and abandons corrupt call count data. If data is collected during a period of unacceptable wait, the resulting forecast will be limited to the prevailing capacity. That’s bad. It will strangle and distort the capacity of the system that you are hoping to optimize.”
Its common practice in the call center industry to discard abandon data. There are different ways of doing this. Most call centers discard short abandon data. Short abandons are callers that abandon in less than n seconds. It’s common for 10 seconds to be used as the short abandon threshold. Some call centers use 30 seconds. Others use even longer thresholds.
Many WFM solutions further reduce abandons using what is called a patience threshold. The threshold may also be called impatience. The less patient you say your callers are, the greater the number of abandons that will be presumed by the forecast. An aggressive patience threshold will tell the system to reduce the offered call counts, from what was collected, to a new smaller number. So if 100 calls were offered, the patience threshold might reduce that figure to 90.
Note that many call centers use both a short abandon threshold and a patience threshold. So several layers of abandon data are discarded.
The calculated loss is not necessarily limited to the historical abandon rate. It’s entirely possible for an aggressive patience threshold to produce a higher loss prediction than the original abandon rate.
The practice of creating loss is justified as compensation for Erlangs’ weak assumption that no callers will abandon. It’s a poor justification because Erlang made no such assumption. He only pointed out that the loss of abandon data compromises the ability of a forecast to respond to changing conditions. And really, that’s exactly what is accomplished by discarding short abandons and generating loss. These methods compromise a system’s ability to respond to change.
Let’s get more familiar with the various methods for generating loss. We have already discussed the simple patience factor. You may have also heard of other methods called Simulations and “PALM”. Simulations use random number to reduce actual call counts to a number that is lower than the historical call counts. PALM stands for Patience And Loss Model.
Various flavors of PALM are available. Some are indistinguishable from patience factors. Other PALM models use the historical abandon rate in each interval as the loss rate for the same interval in the future. These are promoted as being more accurate. It should be clear though that throwing out data does not make a system more accurate. Erlang knew that 100 years ago and so should planners today.
Simulations warrant some further discussion. In promotional literature, simulations are typically described as advanced methods for producing more accurate forecasts that are also more suitable for complex multi skill environments. Some are referred to as discrete event simulations.
In all cases, the historical demand inputs to simulations are limited to historical call counts and average talk time. Some simulations create loss by reducing call counts through the use of random numbers to generate random call blocking figures. If the random numbers suggest a certain number of calls might be blocked, that’s the loss, and that’s the number that gets chopped off your call counts. The lowered call counts are entered into an Erlang formula and the result is reduced staffing requirements.
Other simulations do essentially the same thing but with talk times. If 100 calls are expected and the average duration is five minutes, the simulation creates 100 random talk times. The random number function is asked to produce numbers with a long run average of 5 minutes. For example: Rand (300 seconds). Those 100 random numbers are averaged for a new simulated talk time. Sometimes the average is higher than 5 minutes. Some times the average is lower. The long run average is typically 4% to 7% percent lower.
However the 4% – 7% loss is really just a function of rounding errors. Random numbers tend to have more than 8 digits to the right of the decimal point. If you round that to two digits before you average, the long run average is 4% to 7 % lower.
Planners who rely on simulations to produce forecasts swear by them. They claim to have dramatically better forecast accuracy using the simulation forecast vs a regular forecast. This is pure illusion. To understand the illusion, you need to learn a few more things first.
The only difference between a simulation forecast and an Erlang forecast is the amount of loss. Both are Erlang forecasts. The post-loss figures from the simulation are ultimately entered into an Erlang function to yield the recommended staffing levels. If you compare an Erlang forecast to a simulation forecast for the same period the differences between the two are random because random numbers were used to generate the loss.
Simulation product literature may compare a regular forecasts to a simulation forecast for the same period. The differences may be describe as intelligently simulated projections that are more realistic and reduce staffing costs.
Taking real numbers and making them different using random numbers will never make the real numbers more real. If you accept that logic, you’ll quickly conclude for yourself that simulations are not a sophisticated way of producing more accurate forecasts, they are just a random way of creating smaller forecasts.
Wait For It
If you think you now understand “LOSS” better than the experts, you are correct. However, you don’t understand loss just yet. That will come in a minute.
So we just said that taking real numbers and applying loss does not make them more realistic. What we left out, is that none of the numbers are ever real to start with. That’s because call centers always forecast with answered and abandoned call counts, not true offered call counts.
So if 100 calls arrive that’s 100 offered calls. But if you only answer 45 and 5 abandon, that’s a total of 50. The real offered call count is 100. The fake offered call count is 50. Now apply your loss from impatience, or PALM, or simulator. The fake call count is now reduced to something like 45. And don’t forget that you never bothered to count several short abandons.
Now you have a perfect understanding of Loss, Patience factors, PALM and Simulations. They take fake numbers and make them smaller. That’s it.
So now you are ready to understand why simulations appear to produce higher forecast accuracy. Working with fake numbers, smaller forecasts will always produce higher illusory forecast accuracy.
This is a difficult concept for many to grasp so let’s ease into it.
The forecast accuracy metric is not really measuring forecast accuracy. It’s actually measuring a call center’s utilization of planned capacity. If you plan to answer 100 calls and you answer 100 calls you get 100% forecast accuracy. If 2 callers abandoned that takes the forecast accuracy down to 98%. If they were short abandons, you are back up to 100%. Even if the prevailing demand needed you to answer 200 calls, answering only the 100 that you planned to answer will give you 100% forecast accuracy.
Here’s the forecast accuracy formula:
(Forecasted Calls – (Answered + Abandoned))/ Forecasted Calls
Under planning and under staffing leads to higher utilization of the planned resources so capacity utilization goes up. If hundreds of calls wait for longer than 20 minutes to get access to an available agent this also improves capacity utilization. The wait times have no bearing on illusory forecast accuracy.
Simulating loss manufactures smaller forecasts. Simulations cause under planning and longer wait times so they drive higher capacity utilization and higher illusory forecast accuracy.
So the planners who swear that simulations give them higher forecast accuracy are telling the truth about the metrics, they just don’t realize that the metrics are not telling the truth.
Call center forecasts that are built with interval data have only two historical inputs. The first is a fake call count that has nothing to with the real number of calls offered to the switch. The second input is an average talk time. Notice that there is no wait time data. You can input a target service level but that’s just a target. There is no historical context for that target. You can input a patience factor but that will just produce loss so that the fake call counts can be lowered.
Wait Times Never Mattered
Forecasting experts will tell you that wait time was never an input into the Erlang method. They’ll tell you that wait time is an outcome of the forecast, not an input to the forecast. It is true that the Erlang method only considers call counts and talk time. It is true that the inputs to Erlang’s method do not include wait time information.
The absence of wait time data was never Erlang’s choice or recommendation. The switches of his day had no means of holding calls in queue and therefore no way of tabulating wait time data. Even though no wait time data was available to him, Erlang insisted that negligible wait times were absolutely critical to his planning method. Any evidence of potentially long wait times made it clear that the call counts were tainted and unusable.
Reproducing Erlang’s Success
Erlang was limited to answered call data, because unanswered calls never reached the switch. Erlang had to work with an absence of wait time data because none could be collected. None of those limitations apply today. That’s actually problematic for the forecasting industry because modern call counts are far more volatile than Erlang ever needed to deal with.
So the planning industry in general (and WFM systems in particular) have had to work very hard to replicate Erlang’s successes. They have accomplished this by manufacturing data that reproduces all of the poor data conditions that prevailed over 100 years ago. They’ve done this by:
- Counting calls at the rate they are Answered and Abandoned. This sum is ignorant of demand and rarely exceeds 2% of the planned call answering capacity.
- Discarding as much abandon data as possible. This artificially narrows the potential variance from 2% to something much smaller.
- Compelling agents to adhere to the schedule. This limits the calls that can be counted in the future to a rigid capacity plan that likely never changes. The stability supports high forecast accuracy as long as forecast accuracy is also calculated using the entirely fictitious manufactured call counts.
The amazing result is that call centers today plan with data that is artificially diminished to match the poor quality data that use to flow from 100 year old mechanical switches. Modern switches collect perfectly descriptive data. However, that data is manipulated to make it just as bad as the answered call counts that were read from ancient switch odometers.
However, it would be unfair to say that nothing has changed. Erlang knew the risks of capacity based data. Today, most planners do not.
Understanding True Offered Call Data
Erlang’s switches were naturally limited to capacity based data. Today, WFM solutions intentionally manufacture capacity based data. Why is that?
If any call center ever tried to forecast with true offered call counts, they would immediately see that it’s a completely unworkable method.
Erlang actually predicted this a hundred years ago. He observed that queuing theory and interval forecasting were both ignorant of the carry over effects of calls. Erlang’s solution for this problem was to recommend very long planning intervals like an hour. The length of the planning interval had to be much longer than the talk time. If the planning interval was too short, the lack of carry over knowledge would cause profound distortions.
Erlang also pointed out that his method was completely unsuitable for any situation where call volumes were increasing or decreasing rapidly. He said this was event driven activity. Queuing theory provided capacity recommendations that assume random call arrivals. If the actual arrival is event driven then interval based forecasting will always provide a false and misleading capacity recommendation.
Today, WFM software performs forecasting with 15 minute interval data. That’s way too short to avoid distortions. What’s even worse is that this method is used in situations where the interval call counts are doubling or tripling from one interval to the next. That’s event driven activity.
If this were to be done with true offered call counts, the staffing recommendations would be pure chaos.
So the industry’s solution has been to discard the offered call counts and manufacture fake ones that mirror the prevailing capacity. Answered plus abandoned call counts are very stable because they are limited to the call answering capacity plus the abandon rate. Discarding abandoned data using short abandoned thresholds and loss factors serves to further muffle the real data.
Strong adherence policies add yet another layer of stability and muffling to the falsification of data.
In a small way, these strategy are effective. Fifteen minute interval forecasts for event driven activity should look overtly unreal. By faking the data, the forecasts always look credible – but only because all of the data is entirely fake.
In the end, it’s probably a small price to pay for forecasts that always look accurate and schedules that never respond to changing conditions. That’s’ because WFM vendors don’t encounter any of the costs of long wait times, angry customers, demoralized agents and lost call center revenue.
Lessons in Time
The limitations of interval length and random call arrivals were taught by Erlang. Oddly the knowledge on these topics have been reversed by most of the instructors, courses and books. Most experts teach that calls always arrive randomly in every interval. It’s not true. The original knowledge is that interval forecasts should never be used unless calls appeared to be arriving randomly during the data collection period. Most experts teach that 15 minute interval forecasting is more accurate than 30 minute interval planning. It’s not true.
However, it does not really matter that much — because the call counts are always faked. The fake data produces high fake forecast accuracy.
And now you know everything you need to know to tell the experts that you understand queuing theory much better than they do.
Why limit yourself to knowing more than the experts know about the history of queuing theory.
The next step is to become a queueing theory visionary. Visionaries are those who are among the first to recognize the future state of an industry.
The Future of Queuing Theory
Queuing theory is over 100 years old. No other 100 year method in the world would be considered even remotely relevant to an information technology industry. Yet, queuing theory is used almost everywhere. Banks, hospitals, grocery stores, call centers, amusement parks and so forth.
Queuing theory is still considered a viable technology by the vast majority of capacity planners.
To call this surprising is an understatement. Arguably, queuing theory has always been more of a belief system than a technology.
100 years ago banks thought they were using the transaction processing rate of tellers to project future demand. However, visitors could only be counted at the rate tellers were prepared to process them. Hence the belief that demand forecasts were coming true was unwarranted.
In reality, queuing theory was not producing demand forecasts. Instead, it was actually producing historical capacity averages. Those capacity averages became the new upper limit for staffing levels. Within those limits, banks could only process the forecasted number of transactions. Voila! High forecast accuracy and long lines. Note that jokes about bank line-ups are about as old as queuing theory.
Erlang advanced the science of queuing theory by establishing data quality tests to identify when corrupted data was likely flowing from fully utilized systems. Unfortunately, the tests were misunderstood to be assumptions so very little changed. The planning inputs were limited to answered call data. The lack of super capacity demand statistics fortified the belief that forecasts were coming true.
Today and for the past 30 years, super capacity information has been available from modern call center switches. The counts of calls that are offered to the queue each interval is one form of data that accounts for the total number of callers seeking service.
However, queuing theory immediately fell apart the moment the call counts were no longer restricted to answered calls. This demonstrates that queuing theory never actually worked. Forecasts based on real offered calls never come true. In fact these forecasts are so bad they are unusable. The industry noticed that 30 years ago.
So how did the industry respond when better quality data produced unusable forecast projections?
Instead of fixing the problem the industry changed the data. Super capacity data was ignored completely. Data was manipulated so that it could only be counted at the rate that calls are answered (plus a small subset of abandons).
The resulting data is as close as it gets to pure answered call counts. However, the manufactured data was mis-labeled as offered call data. So for 30 years, planners believed they were getting high forecast accuracy from offered call forecasts. It was never true. It was just a belief system that was supported by falsified information.
While belief systems are slow to fade, the final destination for queuing theory is likely a fate similar to Norse, Roman and Greek mythologies. These are fascinating artifacts of history but they no longer have a following of believers.
Software that forecasts demand according to the habits of Zeus will share the same credibility as software that forecasts with queuing theory.
Planners will be mystified that their predecessors once planned around the assumption that every queue spontaneously empties every 15 minutes.
Any call center, bank, grocery store, hospital etc. can replace the queuing theory that never actually worked. The replacement technology is easily accessible and universally applicable to any business that has customers and servers.
The technology is called an SCO forecast. No call counts, no fake call counts, no fake loss. Just second by second analysis of every call (or visitor).
SCO forecasts intricately understand real queuing activity including the true to life behavior of queues flowing across intervals.
The result is truthful forecasts that immediately move any business on to its ideal staffing curve. Call centers that have made the change have experienced phenomenal improvements. Labor productivity is maximized. Wait times are minimized. Metrics immediately become truthful.