LSPR HomeTUNING LetterTableof ContentsDownloadSampleCPUChartSubscribeArticlesVideosCheryl's ListSoftware Why Your CPU Capacity May Not Match your Vendor's Estimate IBM publishes their Large Systems Performance Reference (LSPR) ratings,and Amdahl and HDS publish their relative performance ratings for new processorspeeds and capacity. Do these ratings match your workloads and will yourwork experience the performance differences as published by the vendors?This paper provides an explanation of why (and why not) your performancemay match the vendor's performance results. It also provides some suggestionson how to confirm the performance you receive. In addressing this issue, I'll cover the following: 1. Definition of terms 2. Why use vendor claims? 3. How do vendors meet their claims? 4. IBM's LSPR ratings 5. Amdahl's performance ratings 6. HDS's performance ratings 7. Why you should use these claims 8. Why your installation's experience may differ 9. What can you do This discussion describes considerations for MVS or OS/390 systems runningon IBM, Amdahl, or HDS processor models. Most of the issues addressed,however, apply to VM and VSE systems as well. 1 DEFINITION OF TERMSThroughout this paper, I'll use various terms and I want to indicate mydefinition of these terms. 1.1 CPU VS. Model VS. CEC VS. MachineSince every author uses different terms for describing a processorcomplex, I'll start with the definitions I'll use in this paper. A CPU is a single processor that can execute instructions onthe behalf of some unit of work. It will have one, and sometimes more,high speed buffers in which to store data while being referenced. A CPUcan be dispatched by the operating system to execute one unit of work,such as a TCB or SRB, at a time. A CPU is sometimes referred to as a processor,but I'll avoid that use in this paper, because some people refer to a processoras having multiple CPUs. A processor model is a combination of one or more CPUs and isdistributed with central and expanded storage, an I/O processor (CPU),possible system assist processors, system control processors, and variouslevels of cache buffer storage. A vendor will normally market many models,such as the IBM 9672-RX4, the HDS Pilot R7, or the Amdahl Millennium GS545.Various authors will refer to these processor models as CPCs (Central ProcessingComplexes) or CECs (Central Electronic Complexes), machines, orsimply processors. I'll use model or machine in this paper. 1.2 Speed Speed is the relative ability of a single CPU to perform work.A CPU with a faster speed than another should be able to process more workin the same amount of CPU time. The speed of a CPU is often rated in termsof MIPS (once referred to as Millions of Instructions Per Second) as describedbelow. The speed of a single CPU in a model of multiple CPUs is often referredto as the "uni-" speed and equates to a single-CPU model in a series ofmodels that are built using the same uni-processor (a single CPU). 1.3 CapacityCapacity, on the other hand, is the relative ability of allof the CPUs in a model to perform work. A model with a higher capacitythan another should be able to process more work in the same elapsed time. Given a model's single CPU speed, I define capacity as being equal tothe effective CPU speed multiplied by the number of CPUs in the model.In a uni-processor, the capacity and speed are the same. Machine # of CPsSU/Sec Avg MIPSMin MIPSMax MIPSMIPS/ CPUMP%TSO MIPSCICS MIPS DB2 MIPSIMS MIPSCB84 MIPS CBW2 MIPSFPC1 MIPS Proc GrpMSUMIPS /MSU9021-71113018.1162 61 62 62.0 100 62 62 62 62 61 61 62 40 11 5.6 9021-962 62444.67 306288325 50.9 82 298 320 301 288 325 342 358 80 53 5.8 9021-972 7 2384.30 348325 374 49.7 80 337 366 343 325 374 398 414 80 60 5.8 9021-98282323.94 389 360 422 48.6 78 377 410 383 360 422 453 469 80 67 5.8 9021-9X2102172.70 465 425 515 46.5 75 447 492 458 425 414 563 574 80 78 6.0 9672-R1512937.94 62 62 62 62.0 100 62 62 62 62 62 62 62 40 11 5.6 9672-R2522732.24 115113 115 57.7 93 116 117 115 113 115 122 121 50 205.8 9672-R35 32555.91 162156 167 54.1 87 167 166 159 156 164 180 17760 28 5.8 9672-R4542438.28 205196 212 51.3 83 212 211 200 196 208 237 23370 35 5.9 9672-R55-1-way1 3129.28 6662 69 65.9 100 69 68 63 62 67 79 76 9672-R5552472.19 260247 273 52.1 79 273 269 249 247 265 311 301 80 45 5.8 9672-R65 62378.47 296280 310 49.4 75 308 308 280 280 310 369 356 80 51 5.8 9672-R7572253.20 328309 345 46.9 71 339 340 309 311 345 422 414 80 57 5.8 9672-R8582127.94 355332 375 44.4 67 365 368 332 339 375 473 474 80 61 5.8 9672-R9592002.75 376351 399 41.8 63 386 390 351 361 399 521 534 80 64 5.9 9672-RX5101908.85 394367 419 39.4 60 402410 367 379 419 564 596 80 69 5.7 9672-RY5102133.05 439400 472 43.9 61 448 461 400 422 472 637 665 80 78 5.6 Figure 1 - Extract of Cheryl Watson's CPU Chart It is possible for one model to have a faster CPU speed, but asmaller capacity (due to having a fewer number of CPUs) than another model.You can also have one model with a slower CPU speed, but a larger capacitythan another model due to a large number of CPUs. 1.4 MIPSWhen CPUs were considerably less complex than they are today, speedratings in terms of Millions of Instructions per Second (MIPS) wereused to rate the CPUs. Because the CPU instruction set and the processormodels have gotten much more complex, the use of a single number to identifythe speed of a CPU has lost a lot of its accuracy. Today, it's much morecommon to hear of CPU speeds as a range of MIPS or relative processingpower. Most MIPS ratings today are simply based on the vendor's claims of therelative performance of each model. Many analysts will provide these MIPSratings based on the vendor's claims in order to provide a consistent viewof speed and capacity across multiple vendors. Gartner Group, the MetaGroup, IDC, and Watson & Walker are among the groups who publish MIPSratings. We publish ours in Cheryl Watson's TUNING Letter [REF001],and I'll use our MIPS ratings for all references in this document. Thereason for the continued use of MIPS is that customers are more comfortablewith MIPS than relative performance numbers. The primary value of MIPS is to provide a starting point to identifya group of processor models that are close to the capacity required. Asingle number will not provide a good estimate of what you can expect toreceive. There are two types of MIPS to be concerned with. One is the total capacityof the processor model. This provides insight into the total amount ofwork that can be processed on that particular model. The second MIPS rateto be aware of is the MIPS per CPU. This estimate provides insight intothe speed of a single CPU. This is needed since it is possible to havea 400 MIPS model composed of 4 CPUs at 100 MIPS each, 8 CPUs at 50 MIPSeach, or 12 CPUs at 33 MIPS each, and your work will perform very differentlyon each of these configurations. You can see some examples of MIPS ratingsfrom our CPU Chart [REF001] in Figure 1. 2 WHY USE VENDOR CLAIMS? 2.1 Alternative Too Expensive Vendor ratings are the basis of all comparison charts available onthe market today because it's simply too expensive for anyone other thanthe vendor to purchase or obtain access to every processor that's available.The vendors have access to all of their own processors and must make performanceruns on all of their own hardware anyway. Sometimes the vendors will haveaccess to their competitor's machines and so can make comparisons betweenthe two with their own workloads. 2.2 Vendor Has Market GoalIn almost every case, vendors know what market they are trying to meetbefore a model is announced and will set the capacity sometimes beforethe model is built. One example of this is Amdahl's June 1997 announcementstating they would provide a 75 MIPS uni-processor in 1Q98 and a 100-MIPSor more uni in 1999. The hardware design team targets the processor speed as they begin thedesign work and they don't stop design, modification of design, and justplain "tweaking" until the model has reached the targeted capacity. Thismeans that you can normally depend on a model matching the vendor's claimsprior to their availability. PROGRPMSUAvg MIPS# of CPsMIPS/CPUMachine7030166.4355.5I 9021-8327030166.4355.5H GX83247030162.6354.2I 9021-8317029168.2442.1I 9021-8207029166.9355.6A 5995-3570M6030162.6354.2H GX83146028162.4354.1I 9672-R356028162.4354.1H Pilot 376026162.2440.5I 9672-R446026162.2440.5H Pilot 456025158.4439.6A GS545Figure 2 - Processor Groups One of the reasons that a vendor will have a fairly specific goal inthe capacity of a model is to provide a full range of capacity relativeto software pricing. Software pricing is normally based on either processorgroup or MSUs (millions of service units), with significant software licensecharge increases with each higher group or range of MSUs. A vendor wouldn'tbe wise to provide three models in a series for groups 50, 70, and 70.A better option which would be attractive to more customers would be toprovide models in groups 50, 60, and 70, even if it meant down-gradingone of the models (in this case, taking the smaller group 70 model anddowngrading it to fit into the group 60 range). (More about that later.)The same is true of MSU ratings. Figure 2 shows an extract from our CPU Chart [REF001] that has organizedthe processors by first software processor group and then by MSUs. Noticethat, in some case, a model will have a higher MIPS rating than a differentmodel in a higher group. A goal of most installations is to obtain thehighest MIPS rating for their workloads at the smallest processor groupand MSU rating in order to reduce costs. In Figure 2, you can see thatHDS GX8314, the IBM 9672-R35, and the HDS Pilot 37 might be good bargainsbecause they have the highest average capacity within group 60. Each vendor is concerned with having a model that will provide an easyincremental step in possible upgrades for their customers. 2.3 Performance GuaranteesAnother reason to use the vendor's claims is that they will often write"performance guarantees". As a customer, you should demand a contractualperformance guarantee from any hardware vendor. The performance guaranteeis normally based on capacity between a new model and the user's currentmodel. Performance guarantees, however, are often difficult to negotiateand difficult to refute or confirm. The reasons will become clearer furtherin this paper and I'll address the performance guarantees again in thesummary. 3 HOW DO VENDORS MEET THEIR CLAIMS? As previously mentioned, the vendors know what capacity they are aimingfor in a particular CPU model. As an example, in order to address eacharea of their target market for the latest Generation 4 models, in June1997 IBM announced 14 new models ranging from 8 MSUs to 78 MSUs. Basedon our analysis, this corresponds to uni-processor speeds of 48 MIPS, 56MIPS, 62 MIPS, 66 MIPS, and 72 MIPS. Only certain MP (multiprocessor) modelsare available for each uni-speed, depending on the target market. Let's take a look at how a vendor can produce a model that providesa specific speed and, therefore, capacity. 3.1 Chip Sorting The CMOS processor chips, while designed to be the same speed, in factturn out to be slightly different. While the vendor requires a minimumspeed out of each chip, a few might be much faster than required and afew might be slower. The vendors end up "sorting" the very fast and veryslow chips out. The slower chips can be used in the smaller models andthe faster chips can be used in the larger, faster models. There will normallybe few chips at the higher end, so they will most likely be used for thelargest models. Amdahl indicates that while they used chip sorting on their5995M models, they aren't using it for their Millennium line. As an example, the fastest IBM Generation 4 (G4) chip is rated at 2.7nano-seconds and is used in their RY5 (10-way based on a 72 MIPS uni-processor).Compare this to the 3.1 nano-second chips in the 66 MIPS uni-processorbased models (R55 to RX5) and the 3.3 nano-second chips in the 62 MIPSuni-processor based models (R15 to R45). In the case of the 2.7 nano-second chip for the IBM RY5, they were ableto take the fastest chips from the chip sorting process and increase thespeed by additional cooling using a refrigerant. (IBM states that it isan environmentally safe refrigerant, R134A.) 3.2 System Structure ChangesThere are several other things a vendor can do to adjust the speedand capacity of a processor model. The size, placement, and access to thelookaside buffers can be changed. The placement and connections to theCPU chips can be changed. The type of wiring can be changed. The locationof instruction sets or microcode can be moved. The amount of parallelismperformed in the instructions or data can be adjusted. The amount and useof high-speed cache can be changed. As seen in IBM's RY5, cooling can beadded to increase the speed of the CPU chip. There are dozens of other changes than can be made in the hardware andmicrocode that will affect the effective speed of a CPU. Suffice it tosay that the vendors have the knowledge and experience to "tweak" theseas needed to achieve a specific speed for a machine. Sometimes a vendor will refer to "degraded" or "down-graded" models(although the labels aren't comforting!) that are needed to fill in a processorrange. These might be slower chips or they might contain system structuredifferences to reduce the effective speed in order to fit the machine intoa lower software rating. Likewise, a "turbo-charged" model might contain faster chips or includeadditional system structure changes to provide the needed increase in speed. 3.3 MP Effect The "MP effect" is a term used to describe the overhead seen due tothe multiprocessing effects of running multiple CPUs in the same image.For years, the MP effect was a fairly consistent 4-5% per CPU added inbipolar models. That is, if a uni-processor was rated at 50 MIPS and asecond CPU was added to the same model, you would see a capacity closerto 95 MIPS than 100 MIPS for the two CPUs. This loss of capacity is referredto as the MP effect and was usually about the same for all processor models. When IBM moved to CMOS models, however, the MP effect seemed to be moreimportant. As an example, look at the #MIX ITRRs for two ten-ways, thebipolar 9021-9X2 and the CMOS 9672-RX5. The column called MP % showsthe percentage of effective MIPS in the MP compared to the total possibleMIPS if there were no overhead. From Figure 1, we can see that the 9021-9X2provides about 465 MIPS which is 75% of a potential 620 MIPS (10 CPUs at62 MIPS, the speed of the 711 uni). The 9672-RX5, on the other hand, onlyprovides about 394 MIPS, at only 60% of its potential 660 MIPS (10 CPUsat 66 MIPS). The Rx5 models show the highest MP effect to date, and is one of thereasons, I think, for the interesting series of models that IBM announcedin June 1997. The R55 (5-way) to RX5 (10-way) models are based on a 66MIPS CPU uni, which is faster than IBM's largest bipolar, the 9021-9X2(at 62 MIPS uni), but the total capacity of 394 MIPS is far less than the9X2 (477 MIPS) due to the fact that the RX5 has more MP overhead than the9X2. So IBM also announced the RY5 10-way at the same time. The RY5 isbased on a turbo-charged 72 MIPS uni CPU, and is able to provide a capacityof 439 MIPS, which is much closer to the 9X2. That is, to compensate forthe higher MP effect, IBM provided a model with faster CPU chips. The only reason to be aware of the MP-effect is when you are consideringthe addition of a CPU to a current configuration. From a capacity planningstandpoint, you should be aware of the decrease in capacity of the otherCPUs. It's not a pricing issue since the prices are adjusted by the vendorbased on the effective capacity of the machine. 4 IBM'S LSPR RATINGS To confirm the speed and capacity of their processor models and to helpcustomers understand what to expect from different models processing theirworkloads, IBM publishes their Large Systems Performance Reference [REF002],as a manual and as a performance tool. You can also find the LSPR numberson the Web [REF005]. Both their techniques and results are published inthe manual, and I would strongly recommend that you become familiar withtheir methodology. This section of this paper provides my summary of their50 page discussion on the technique. 4.1 Workloads IBM has designed and accumulated a series of traditional workloadsthey feel are representative of customer's workloads. The sets of workloads consist of: CB84 - Commercial Batch Workload This set of 130 jobs, with 610 unique steps, provides a typical, traditional,view of batch work. This workload consists of COBOL, Assembler H, and PL/Iprograms, along with compilers and utilities such as DFSORT. Access methodsfor BSAM, QSAM, BDAM, and VSAM are used. This is most representative of the traditional batch applications runningin installations today. CBW2 - Commercial Batch Workload 2 This set of jobs was begun with SP 4.2.2. It has 32 jobs with 157 stepsand is more representative of new applications that exploit more ESA functions,such as data in memory. It consists of programs written in C, COBOL, FORTRAN,and PL/I. The steps perform sorting, use DFSMS utilities, compilers, VSAMand DB2 utilities, SQL processing, SLR processing, GDDM graphics, and FORTRAN engineering routines. There is more JES processing and the workload spendsabout 50% of the time performing DB2 activities. FPC1 - Engineering/Scientific Batch Workload This workload is an engineering and manufacturing jobstream that includes"static analysis, dynamic analysis, computational fluid dynamics, nuclearfuel calculations, and circuit analysis.' This will be representative ofmuch of the SAS work in commercial installations due to SAS's heavy useof floating point. TSO The TSO workload is representative of TSO program development usingISPF/PDF. It includes editing, browsing, foreground compiles, testing,graphics, and Info/Management. 25 different scripts are used and drivenby an internal driver to meet the activity required to drive thesystem to 70% or 90% utilization. TPNS is not used although IBM periodicallyuses TPNS to confirm the consistency of their own internal driver. CICS In SP 4.2, the CICS workload consisted of 102 transactions and in SP5.1, the workload consists of 204 transactions. CICS is run in an MRO configuration(Multiple Region Option) with a TOR (terminal owning region), an AOR (applicationowning region), and an FOR (file owning region). In SP 4.2.2, an additionalAOR/FOR region was added. As many of these "MROplexes" are run as neededto run the system to 70% and 90% utilization, usually one MROplex per oneor two CPUs. COBOL and assembler are used for the programs and VSAM isthe primary access method. The work is designed to be representative oforder entry, stock control, inventory tracking, production specification,hotel reservations, banking, and teller systems. IMS The IMS workload is similar to the CICS workload from DLI applications.There are 17 transaction types. Enough Message Processing Regions (MPRs)are run to bring the system to the desired utilization (70% and 90%) withoutcausing contention within an MPR. DLI HDAM and HIDAM access methods areused with VSAM and OSAM databases. In SP 5.1, two IMS control regions areused and data sharing occurs using the IMS Resource Lock Manager (IRLM).BMPs (Batch Message Processing Regions) are not included. DB2 The DB2 workload consists of seven transactions applied to two applications,inventory tracking and stock control. The DB2 requests are driven by IMS/DC.Enough regions are created to eliminate contention within the subsystems.There are two DB2 databases comprised of 11 tables for inventory and 5tables for stock control, with 1 to 5 indexes for each table. Since the two workloads don't invoke DB2 sorts, the DB2 sort assistfeature available on some models is not exercised. Only one type of workload is run during an LSPR test and the systems arerun at fairly high CPU utilization (close to 100% for batch and FPC1 andat both 70% and 90% utilization for online and TSO). For the online work,the IBM team waits until the system has stabilized before starting themeasurement phase. MachineTSOCICSDB2IMSCB84CBW2FPC1#MIX9021-9826.007.236.076.187.137.206.956.489672-R151.001.001.001.001.001.001.001.009672-R554.404.344.023.984.285.024.864.209672-RX56.486.615.926.116.769.099.616.36Figure 3 - IBM LSPR ITRRs [from REF002 & REF005]Two very important items to note is that only one type of workload isrun in each test and the tests are run in totally unconstrained environments.That is, CICS is not tested with TSO and IMS is not tested with batch duringthe same runs. Also, in order to accurately determine the effect of theprocessor capacity, IBM must ensure that no other constraints exist onthe system. That is, there is virtually no paging due to the abundanceof all types of storage, there is no I/O constraint (almost 100% cache),there isn't a lack of VTAM buffers or JES initiators, and even the CPUis not run until it is constrained (it is never run at over 100% busy). 4.2 MeasurementsFrom the measurements made while running these benchmarks, IBM calculatesan Internal Throughput Rate (ITR) which is equal to the units of work (jobsor transactions) divided by the processor busy time. Models with highercapacities will be able to process more work in the same amount of processorbusy time compared to models with lower capacities and will have higherITRs. Each workload will have its own ITR. To be able to compare two models,IBM uses an ITRR, Internal Throughput Rate Ratio, which is calculated bytaking the ITR for a base model and dividing it into the ITR for the newmodel. Prior to June 1997, IBM published a list of the ITRRs using their9021-520 as a base model with the ITR for each workload being set to 1.0. Thus, a model that can process 50% more work in the same amount of CPUtime compared to the 520 will have an ITRR of 1.5. In June 1997, IBM published preliminary LSPR ratings for their newestmodels using the CMOS 9672-R15 as a base. In August 1997, they republishedtheir LSPR ratings for all models using the R15 as the new base. Thesenew ratings were quite a bit different than the 520 ratings because theoperating system and subsystem releases used in the LSPR runs were changedat the same time. This led to more than a little confusion. If we takeIBM's statement that the R15 is equivalent to the 9021-711, and we alsoaccept the 711 as a 62 MIPS machine, all other machines would see a corresponding2% to 6% increase in MIPS ratings based on the LSPR ratings! Figure 3 shows an extract from IBM's LSPR charts for their three modelsas compared to the 9672-R15. You can interpret the chart as saying thattheir TSO workload achieved 4.40 times as many transactions in the sameamount of processor busy time on the 9672-R55 as compared to the 9672-R15.This is based on the total capacity of the model, not necessarily the speedof a CPU as we'll see later. In order to help people consider the capacity based on a mix of workloads,IBM derives an estimated ITRR called #MIX, which consists of 20% of theITRR of each of the five workloads: CICS, IMS, DB2, TSO, and CB84. Thisis a calculated value only, and is not confirmed by running 20% of eachworkload, which would be next to impossible to achieve consistency. 4.3 How These Are UsedThe #MIX, or an early expectation of #MIX, is used to derive the SRMservice unit coefficient, the service units per second as published foreach model. The SU/Sec is used by SRM to compensate for different speedCPUs when determining the frequency of invoking certain functions. TheSU/Sec used to be a fairly good indicator of CPU speed because it is relatedto the speed of a single CPU. A CPU with an SU/Sec of 400 is roughly twicethe speed of a CPU with an SU/Sec of 200. This number is becoming lesseffective, however, as an indication of CPU speed for several reasons. First, the SU/Sec value is published and made available often beforefinal LSPR tests have been completed. While the published ITRRs might change,the SU/Sec values are seldom changed. Secondly, in older models, the differencein speed between workloads was fairly close. With modern processors, thedifference in speed between workloads can be over 30%. As an example, inFigure 3, the FPC1 workload on a 9672-RX5 has an ITRR (9.61) that's over60% higher than DB2 (5.92), and over 50% higher than #MIX (6.36). It wouldbe very difficult to use a single number to indicate the speed of the RX5for these differing workloads. There's a 14% variation in just the fiveworkloads used to derive the #MIX. The published #MIX is also used by most of the industry analysts todetermine the relative MIPS ratings of different processors. This is animportant concept for people that use published MIPS because it means thatthere could be a 40% or more variance between the published MIPS and whatyour workload would see. In our CPU Chart, we list estimated MIPS per workloadto help people understand the difference that workloads make in estimatingthe capacity of a specific model. 4.4 Changes After GA If there are significant performance improvements made available afterGeneral Availability (GA) of a model through microcode or other means,IBM has indicated that they will rerun the test and republish the changedITRRs. They do not expect to alter the SU/Sec values, the processor groupratings, or the MSU ratings. 5 AMDAHL PERFORMANCE CLAIMS Amdahl has a set of internal benchmark jobs similar to IBM's, but theydo not publish a description of their workloads or specific performanceclaims for each type of workload. They normally publish a range of performancethat can be expected for a given model compared to their 5995-4570M. Forexample, their newly announced CMOS Millennium series contains a modelGS745, which is listed as having a performance rating of 1.16 to 1.28 ofthe Amdahl 5995-4570M. Since Amdahl does not publish their workloads, we can't be certain whichworkloads are at which end of the range, although we might expect themto be similar to IBM's workloads. Most analysts take the midpoint of thehigh and low to be the average and relate that to IBM's #MIX workload.Whether this is valid or not is to be seen. Amdahl has always derived their SU/Sec value a little differently, however.Their logic has been to provide consistent TSO response across a hardwarechange. In order to do this, the same percent of TSO transactions needto complete in first period. For this to be true, the durations must beadjusted to match the CPU speed. Amdahl assigns a value to the SU/Sec toensure that the same percent of TSO transactions complete in first period.This has meant that the Amdahl SU/Sec values for bipolars have been higherby 6-8% than corresponding IBM and HDS bipolars. The Amdahl models hadSU/Sec values that resulted in calculations of about 52 SU/Sec for eachMIPS, while IBM and HDS had closer to 48 SU/Sec for each MIPS. With CMOS models, however, the vendors are getting closer. The IBM CMOSmodels are now closer to 51 SU/Sec while the Amdahl models vary from 48to 52 SU/Sec (with a strange anomaly in the GS535 which results in almost55 SU/Sec per MIPS). This means two things to you. It is fairly dangerous to try to compareservice units between models from different vendors. And it's also dangerousto compare service units between models of widely different ages. 6 HDS PERFORMANCE CLAIMS HDS uses two techniques for publishing performance ratings. Two seriesof HDS models, the Gxx series and their CMOS Pilot models, are designedto be directly competitive to corresponding IBM models, and therefore usecomparable IBM ratings. The Skyline models which are based on the fastestCPU speed available today, are not comparable in speed to any IBM or AmdahlCPU, so HDS publishes separate ratings for the Skylines (as well as a fewother models that don't have corresponding IBM matches). The HDS models that are comparable to the IBM models are published byHDS as having "equivalency" to the IBM models and their performance claimsare equivalent to IBM's claims. For the few models in these series thatdo not have a direct equivalent model within the IBM range, HDS publishesa performance range, such as one model might provide 1.2 to 1.4 times theperformance of an HDS GX8110. The Skyline models, which are really combinations of bipolar and CMOStechnology, don't relate to an IBM model, but performance claims are publishedthat indicate, for example, that a Skyline is 2.0 times the HDS GX8114.HDS has derived these performance claims by running their own set of benchmarkjobs. Neither a description of the jobs or the resulting measurements arepublished. I've noticed that Skyline SU/Sec values range from 48 to 52 SU/Sec perMIPS, so the SU/SEC values might appear higher or lower than serviceunits from other vendors. 7 WHY YOU SHOULD USE THESE CLAIMS 7.1 The Bad News It's important to understand that there is no measurement in existencethat can provide a single rating for a processor model that is indicativeof its speed and capacity for a variety of workloads. It's similar to buyinga car based on expected mileage. A car might be rated for 20 miles to thegallon, but that is seldom what you will find. You will drive the car muchdifferent than the testers that came up with the initial rating. For example,if you happen to have a lead foot (i.e., drive too fast!), you'll NEVERget the mileage your car is rated for. If you drive it according to theirrecommended speeds, and in their type of traffic, and on their types ofroads, and with the same amount of weight in the car, and with all of theextra equipment turned off, you might be able to come close to their estimate.The same is true of processor models. With that said, however, I strongly recommend that you use the vendor'sclaims for sizing a machine, because it as close as you can get initially. 7.2 Performance Guarantees I also believe that you should not obtain any hardware without somecontractual commitment from the vendor about the performance that you expectto receive from the processor model. Since I know that's it's possibleto obtain a performance guarantee from a vendor (and also know that itwon't be offered unless asked for), I'd recommend that every installationplan to obtain such a guarantee. These guarantees can only be obtainedbased on the vendor's claims. So therefore, I think you should trust the vendor to provide the rightcapacity estimates, but get it in writing! The trick in any contract is to identify how you and the vendor willagree to the performance that you're getting. This often requires veryknowledgeable people on both sides who can understand the difference inperformance because your workloads may not match the vendor's workloads. 7.3 Industry Charts Based on Vendor's ClaimsSince most of the industry charts of MIPS are based on vendor's claims,almost every company is indirectly using what the vendor has provided. 8 WHY YOUR EXPERIENCE MAY DIFFER Why wouldn't you get the same performance out of a processor model foryour workloads? There are several reasons and I'll address the most commonamong these: 1. Workloads vary 2. Your workloads don't match the vendor's 3. You measure different things 4. Your mix doesn't match the vendors 5. The workloads vary throughout the day 6. The volume affects capacity 7. Constraints in software affect capacity 8. Constraints in hardware affect capacity 9. LPAR affects capacity 10. Dispatch priorities affect capacity 11. Software levels affect capacity 12. Levels of PTFs affect capacity 13. Different facilities invoked 14. Amount of storage affects capacity 15. Level of tuning 16. User's behavior changes 17. The one thing that remains consistent is that you will always have change! 18. All of the above 8.1 Workloads VaryThe primary reason that a single performance estimate will not workfor most sites is that performance differs for each type of workload. Inthe newer processors, the range of this difference is getting larger witheach new model. I think that the following summary made from the LSPR manual [REF002]is enlightening and helps provide some insight to what you might expectto see: a. The actual MIPS rate for a model will, in general,be highest for workloads at the batch end and lowest for workloads at theonline end of the spectrum. b. When comparing n-way models to their corresponding uni-processormodel, the actual capacity will be higher for workloads at the batchend and lowest for workloads at the online end of the spectrum. c. When comparing models with larger high speed buffercaches to those with less, the capacity will be higher for workloadsat the online end and lowest for workloads at the batch end of the spectrum. One problem is that your workloads aren't necessarily designed to meetthose same specifications that IBM uses for their LSPR workloads. For example,you might have some TSO users who use a lot of SAS (close to FPC1 workloads),others who access DB2 frequently (close to DB2 workloads), and others whospend the bulk of their time in ISPF (close to TSO workloads). The numberof each type of user will determine which part of the scale you're on whenevaluating TSO. BMPs (IMS batch programs) may look much like IMS and yetmay have many of the characteristics of CBW2. In order to use LSPR effectively,you must be aware of the workload mix you're executing. 8.2 Your Workloads Don't Match the Vendor'sIBM has defined some very specific workloads, and while Amdahl andHDS have their own workloads for testing, we don't know what they are.You will need to determine how well your workload matches the vendor'sworkloads before you can tell if their estimates will be useful. Here are a few examples where the performance of some workloads mightnot meet the vendor's expected performance claims: 1. IBM's TSO workload is an ISPF based workload that hasa large amount of editing and browsing types of transactions. If your workloadis primarily FOCUS or ADABAS, then your performance probably won't be thesame. FOCUS and ADABAS have characteristics that are much closer to CICSand DB2 than TSO. 2. A few customers found that some batch jobs took muchlonger than expected when they moved to a CMOS processor from a bipolar.It turned out that the problem was due to the fact that the packed decimalinstruction set was much slower on the CMOS 9672 models than on the bipolars.A heavy use of packed decimal instructions tend to occur in COBOL programsthat use subscripts for heavy table processing and were compiledwith a compiler option of 'TRUNC=BIN'. IBM didn't run into this particularcombination of heavy packed decimal work because their benchmark programsused indexes rather than subscripts. (I remember teaching students thatthey should use indexes rather than subscripts back in early 1970, butprogrammers and even vendors are still using subscripts!) This phenomenonhas been significantly improved with some microcode changes, but it stillexists in many of the IBM 9672 models and HDS Pilot models. For more informationon this, see WSC Flash #9608 and the archives from the Watson & Walker'Cheryl's List' listserver [REF003]. 3. As mentioned earlier in the DB2 workload description,IBM's DB2 transactions don't cause the DB2 Sort Assist facility to be invoked.Since many applications do require a DB2 sort, your workloads could getbetter or worse performance when moving between processors with or withoutthe sort assist facility. 4. One of the most common problems I've seen recently isa much larger occurrence of work that uses floating point. SAS, for example,uses floating point for most of its work. Any installation with a largepercent of SAS in their daily processing should consider the FPC1 workloadas being more representative of SAS than other workloads. Since FPC1 isn'tused to determine the #MIX from IBM, SAS users can get very surprised asseen by some quite low ITRRs on FPC1 workloads on some models. 8.3 You Measure Different ThingsIn describing IBM's LSPR technique, I referred to their use of 'processorutilization'. This is all of the captured CPU usage for the measurementinterval and includes CPU time consumed by all the system address spacessuch as MVS, JES, RACF, VTAM, GRS, CONSOLE, etc., not simply the time recordedby the application in the SMF type 30 (job termination) or type 72 (workloadby performance group or service class) records. IBM can obtain all of the measurements because they run in a dedicated,stand-alone environment. It's much harder for an installation to obtainall of the CPU for a specific workload. For example, if you run TSO andCICS at the same time, how much of MVS, RACF, VTAM, etc. is being usedby the TSO workload and how much by the CICS workload. You simply can'ttell. So if you see an CICS ITRR between two machines of 1.2, does that meanthat the speed that is 20% faster is seen as reduced CPU time in just CICSor will part of it be seen in reduced CPU time for MVS? You don't reallyknow because IBM is really measuring multiple things at one time (thatis, the SMF time of the region, MVS, VTAM, initiators, JES, etc. 8.4 Your Mix Doesn't Match the Vendor'sThe published #MIX by IBM and the average performance estimate by Amdahlrepresents some mix of workloads. In IBM's case, the assumption is thatthere is 20% TSO, 20% CICS, 20% IMS, 20% DB2, and 20% traditional batch.This isn't representative of any installation that I've ever seen. So you'll need to determine your own mix. For daytime processing, youmight want to look at your peak processing period and determine the makeup of the work at that time. For example, let's assume you are moving froma 9672-R53 to a 9672-R83 and you run 50% CICS, 10% TSO, 10% batch, and30% "other things" like MVS, RACF, VTAM, monitors, operation's startedtasks, and scheduling programs. When using a variety of work, it's easiest to determine the percentof each type of work during the peak interval (that's when the capacityof the machine is the most important). Simply group MVS and supportingfunctions with the miscellaneous workloads and use the #MIX ITRR. Let'sassume that you had some work on an 8-way 9021-982 and planned to moveit to a 10-way 9672-RX5. Also assume that you were running 70% CICS, 10%TSO, and 20% other (MVS) during the peak intervals. From Figure 3, we cancalculate the ITRR for CICS to be .91 (6.61 / 7.23), the ITRR for TSO is1.11 (6.48 / 6.00), the ITRR for #MIX is .98 (6.36 / 6.48). That's 70%at .91, 10% at 1.11, and 20% at .98 for a combined ITRR of .94. 8.5 The Workloads Vary Throughout the DayOf course, that's for the typical peak processing time. What aboutthe other times of the day. If you run busiest during daytime processingand are able to complete nightly processing in plenty of time, you canprobably simply use the daytime estimates. But if you have a tight batch window at night, as many installationsdo, you will need to calculate a daytime ITRR and a nighttime ITRR to betterdetermine the effect of a processor change. It would be quite possibleto find a site with a mix of 70% online during one peak hour only to findthe mix has shifted to 70% batch in the nighttime peak hour. As more companies are going to more international processing windows,the variation between day and night processing is reduced. Even the onlineworkloads will vary dramatically throughout the day. 8.6 The Volume Affects CapacityIBM, as does Amdahl and HDS, ensures that they are running the systemat close to capacity, but not exceeding it, and certainly not severelyunderutilized. For IBM that means that measurements are taken at closeto 100% utilization for batch and FPC1, and at both 70% and 90% for theonline workloads. Your results will almost certainly vary if you run at different capacities.Frankly, few sites will upgrade to a new machine and immediately run atbetween 70% and 100% busy. A new machine almost always has excess capacity,and this will affect how much CPU is needed for the workload. For some models, being underutilized will actually provide worse CPUoverhead due to their management of high speed cache and how work is dispatchedto the CPUs. Other factors, such as LPAR processing can add to several"low utilization" effects. For most models, however, being underutilizedwill result in less CPU time per transaction than the work will see asthe system gets busier. That means that shortly after moving to a new processor, you will tendto see very good performance. As you get more work on the system, whichmay be many months later, the CPU usage of the system will increase. In almost every analysis I've made, jobs will take more CPU time whenthe CPU utilization is at its highest. This is often referred to as the"multi-programming" effect. If you measure the data at 50% CPU busy, itwill always be to the vendor's favor, because the machine will be ableto get the work done in less time than estimated at higher utilizations. This phenomenon is seen very frequently. An installation that has beenseverely constrained for months (running well over 100% busy for long periodsof time) might replace their current machine with a model that has a highercapacity, so the entire workload can be processed while only running at60% busy on the new processor. The jobs have been experiencing excessiveCPU overhead due to the high utilization and are then moved to an environmentwhere they take less than the vendor's predictions, and it appears thatyou easily got what you paid for. Therefore, you will need to wait until you've reached full utilizationon your processor before knowing whether you have obtained the processorcapacity that you had planned for. 8.7 Constraints in Software Affect CapacityIBM points out in their LSPR manual that they ensure that no softwareor hardware constraints exist during their measurement period. That is,an IPS parameter that's set to limit the amount of work on the system orpoorly structured and managed JES initiators could seriously affect thecapacity of your new machine. Unfortunately, this happens quite often when an installation upgradesto a new processor. There are several dozen parameters that should be modifiedwhen you upgrade to a larger capacity machine. If these aren't modified,you could be restricting the capacity of your new machine. A simple parameter,such as the domain constraints in the IPS, could cause an increase in theamount of swapping, and therefore, overhead in the new model. 8.8 Constraints in Hardware Affect Capacity IBM eliminates hardware constraints during their testing because theydon't want to consider the CPU cycles spent dealing with the constraint.For example, they don't want to spend CPU cycles in paging when the intentionis to determine the speed of the CPU for a specific type of work. You should be aware that if you have any hardware constraints, suchas lack of I/O paths, poor cache hit ratios, poorly performing DASD, storageshortages, or other hardware constraints, that you could be impacting thepotential capacity of your machine. 8.9 LPAR Affects CapacityPerhaps the most significant reason that your workloads may not matchyour vendor's expectations is that all performance claims are made fora non-LPAR environment. The vendors aren't trying to hide anything, butthey simply can't account for all of the variations seen in an LPAR configuration. LPAR processing, whether it's from IBM's PR/SM, Amdahl's MDF, or HDS'sMLPF will take additional cycles for processing time. A small portion ofLPAR processing may be displayed in the partition data available from RMFand CMF, but that is only the LPAR management time and does not includethe bulk of the actual overhead. Most LPAR overhead is actually experiencedby the workloads, and their CPU time (TCB or SRB) will increase in an LPARsituation. The amount of increase is quite variable and dependent on several factors.The primary factors are the number of LPARs on the machine, the total numberof shared logical CPUs, the ratio of logical to physical CPUs, and theactivity in the other LPARs. An increase in any of these four will causean increase in the CPU time for your work. This CPU time has not been consideredin the vendor's announced performance claims (nor can it be). The LPARoverhead could be as small as 2% (in a production LPAR that's given 95%of the machine) to 25% (in a grossly over-configured, multiple LPAR, multipleCPU environment). You need to take this into consideration if you are running in any typeof LPAR environment. 8.10 Dispatch Priorities Affect CapacityBecause you run a mix of workloads, the dispatch priority you haveassigned to these workloads will be more important as you get closer torunning your system at full capacity. For example, if batch is runningat a low dispatch priority, as it is in most sites, the inconsistent CPUload from your higher priority work, such as CICS and TSO, will cause thebatch work to get sporadic, inconsistent access to the CPU. This causesan increase in CPU time that is normally not considered in the vendor'sperformance claims. That is, if you have all of your batch swapping inand out of storage and moving between multiple CPUs since they don't haveenough priority to stay on one CPU, you will see increased CPU times inyour batch workloads. 8.11 Software Levels Affect CapacityThe vendor's benchmarks are run on a level of MVS software that mayor may not match yours. Until more installations are all running the samelevel of OS/390, it's highly unlikely that the levels of all of your softwarematches that from the vendor's benchmarks. You need to consider not onlythe level and release of MVS, but you need to consider the release levelof VTAM, JES, RACF, TSO, ISPF, CICS, DB2, IMS, and other key products inyour installation. Of course, the levels and releases of your monitors,scheduling products, etc. should also be considered. What this means is similar to the discussion in 8.2 where your workloadsdon't match the vendors. An example of this is in ISPF. ISPF V4 took alot more cycles than ISPF V3. If the vendor is using ISPF V4 for the baseand you are running ISPF V3, you will probably see a difference in howthe TSO workload is affected when moving between two models. That is, thevendor did not measure the impact of ISPF V3 – it could have been worseor it could have been better, but only you will know (it won't come outof the benchmarks). 8.12 Levels of PTFs Affect CapacityJust like software levels and releases, the specific PTFs you haveon your system will affect the capacity of the machine. As an example,the Catalog Address Space (CAS) takes a LOT more CPU time in SP 5. If IBM'sbenchmarks use SP 5, their ITRRs include the impact on CAS when it's movedto another model. If you are still on SP 4, the CPU time for CAS will betrivial and wouldn't be affected by a change to a different model. There have even been cases where IBM has had to apply some PTFs beforerunning their LSPR tests due to some performance improvements that wererelated to the hardware. 8.13 Different Facilities InvokedThe biggest problem with current performance guarantees that I seeis that they consider older, traditional, applications and not the newerapplications. Since the current benchmarks are run on traditional workloads, how willyou be able to tell the impact of a new processor model for your new applicationssuch as IBM's Web Server on MVS, their LANServer MVS, object technologywith SOM and CORBA, web applications like Java, TCP/IP instead of VTAM,DB2 stored procedures, OpenEdition MVS, MQSeries, and similar new applications. Likewise, consider the applications that are trying to take advantageof some of the facilities that were new as of SP 4 or 5 and still haven'tbeen used, such as SmartBatch, DB2 Sort Assist, CICS storage protection,LPAR automatic recovery, etc. One of the newest applications, parallel sysplex, is yet to be consideredfor the hardware benchmarks. In a parallel sysplex configuration, how muchdoes the processor model affect the communication and overhead to and fromthe coupling facility? 8.14 Amount of Storage Affects Capacity This is an old consideration that people often forget. If you havea lot of storage available, you can get two benefits. First, you reducethe overhead of paging and swapping, which only steals cycles from theCPU. Second, applications can take advantage of lots of storage and runin less cycles. The sort program is a good example of this. Incore sortstake less CPU time than DASD sorts, while hiperspace sorts can take moreCPU time, but less elapsed time. If you have a lot of storage and use it, you will take the least amountof CPU time per transaction. If you are short on storage, you will endup taking more CPU cycles from productive work and spend them on pagingactivities. 8.15 Level of TuningThe level of tuning makes a large difference in the effective capacityof a machine. The easiest example to show of this is good blocking. Yes, you've probablyheard for years that good blocksizes (half or full track blocks on disk)are the most efficient. And most installations have ensured that productiondata sets are well blocked. But in most sites, programmers tend to usea factor of 10 to get blocksizes (80 x 800, 1600 x 16000) which producevery poorly performing jobs. Good blocking could reduce the CPU by 10%to 20%. If you have many of these in your batch workload, the programsaren't running very efficiently and may not be getting the maximum benefitout of the new processor. A well tuned system will always get the best performance out of a newconfiguration. 8.16 User's Behavior ChangesOne of my favorite true stories is about a system where we improvedthe response time to a group of users from 10 seconds to sub-second. Withintwo days, the amount of CPU consumed by that group of users tripled. WhenI went to ask them why their CPU usage increase, they interrupted me beforeI could ask to show me a new trick. They said, "Boy, Cheryl, before thatchange you made, things were really slow! If we wanted to look up yourrecord, we'd have to type in your full name, 'Watson, Cheryl', then waitforever for a response. Now we just type in 'Wa' and start scrolling untilyour name comes up. It's SUPER fast now!". For those of you that have experiencedthe crunch caused by a large amount of VSAM browsing in a CICS applicationfrom a LOT of users, you'll understand how distressed I was. To take advantageof the improved response time, they started using a much less efficienttechnique that cost us quite a few cycles. Many sites have gotten burned because an improvement in response timescaused users to change their behavior. Another common example is seen whenTSO users find that the system is so fast they start doing all of theirwork in foreground rather than submitting batch jobs. This leads to excessivelylonger TSO third period response times and CPU consumption. 8.17 The one thing that remains consistent is that you willalways have change!IBM is fortunate in that they can always provide a consistent, unvarying,environment in which to run their benchmarks. They are able to obtain consistentresults from one run to the next. This is seldom the environment that you can expect to see. The onlyconsistency in most production sites is the inconsistency of the workloads.An entire day of processing can be harmed if a batch job from the nightlycycle abended and must be run during the day with the online workloads.TSO users may all come back from a meeting at the same time and hit thesystem with double the normal TSO load. The CICS group could change a singleparameter in their CICS parameters and increase the CICS CPU time by 5%.The DB2 group could add some indexes and reduce DB2 time by 15%. Just be assured that you will seldom have two periods of time thatare consistent in which to collect your measurements. 8.18 All of the AboveIn many case, some or all of the seventeen documented reasons are allinterplaying with each other at the same time. Very often, there isn'tone reason for a change, there are many at the same time. Measurement metricsmay appear to report random numbers, frustrating the most senior levelmeasurement expert. That interplay, in itself, may hide the real causeof underlying problems. It is sometimes more difficult to recognize the reasons for poor resultsthan it is to fix the problem. 8.19 SummaryYour work may vary considerably from the workloads that were used bythe vendor to determine the relative capacity and speed of a new model.It's up to you to determine how each of these factors will affect the realperformance you receive. 9 WHAT CAN YOU DO? You can do two things to ensure that you get your money's worth. You canobtain a performance guarantee from your vendor before deciding on a processormodel. And you can measure (and understand) the relative change in capacityafter you've moved to the new model. 9.1 Performance Guarantee Each vendor can (optionally) provide a performance guarantee, but theywill almost always qualify the performance as it applies to what they thinkyour workloads will experience. They have a lot of experience with theirown models that isn't published and, in my experience, do a very good jobof sizing when they know that a performance guarantee will be used. Part of the performance guarantee is an agreement on the methodologythat will be used to confirm that you receive the performance you expect.Generally, this consists of itemizing your important workloads and specifyingtheir current performance with expected performance from the vendor. Most performance guarantees require that the analysis be done betweentwo environments where all changes have been frozen. That is, new workloads,changes in operating system, parameter changes, etc. are not allowed betweenthe two period of analysis. Be sure that you can handle this period oftime without any system or application changes. 9.2 Measure Your Own SystemTo understand whether a new processor model is meeting its expectations,you need to measure what you are actually experiencing. IBM provides one solution for this in "The Complete View" section ofchapter 5 of their LSPR manual. As an introduction to their solution, theystate that "For a validation to work, there must be a commitment thatthe workload run on the new processor be the same as that on the old processor.In other words, there should be no shifting of workloads until after thevalidation is complete." Their technique is to use the logical I/Os related to the total processorbusy over a week of prime shift data. I've tried this technique and have found that it only worked in a fewcases because users could not make the commitment that the workload notchange during the week directly after a processor upgrade. The workloadwill almost certainly change after a processor upgrade and changes willbe made by the data center personnel. I've found more success with the technique of identifying stable jobsteps and online transactions before the change was made and seeing howthey were affected after the change. This technique was first introducedby Joseph B. Major. Though this technique doesn't take operating systemdifferences into account (such as the effect on JES or RACF), it will definitelyshow the effect on the application. If you don't have time to write yourown programs to find these stable jobs and collect this data, take a lookat our latest product, BoxScore.BoxScore identifies and quantifies the effect of any change, such as tuning,Year 2000 conversions, processor upgrades, etc., on stable job steps andtransactions. This software is based on research that I've been doing inthis area for the past 10 years. 10 SUMMARY The performance estimates for new processor models from IBM, Amdahl, andHDS provide valuable data to help you understand how much capacity youcan expect to see if you move to those new models. This is especially trueof the IBM LSPR ratings by workload. We would hope to see workload performanceratings from the other vendors at some point in the future. Your workloads may not see exactly the same effects because of severalfactors, among which is the fact that your workloads don't mimic the vendor'sand most installations run in an LPAR environment, which may not be consideredin performance claims. To ensure that the vendor will help you if you don't get the performanceyou expect, I recommend that you ensure that the vendor provides a performanceguarantee before delivery of a new model. You should define a technique to identify the relative affect of anyprocessor model change based on your own workloads, not on estimates fromartificial workloads. Remember that the vendor's claims are almost alwaysprovided for the optimum environment – one running with no constraintsin a non-LPAR environment and one that is well-tuned. If you are runningin an LPAR, have any constraints, or are not well-tuned, you can't expectto achieve the same performance results. Note: This article with some modifications has previously been publishedin Watson & Walker's BoxScore User's Guide [REF004]. 11 BIBLIOGRAPHY [REF001] Cheryl Watson's Tuning Letter, CPU Chart [REF002] Large Systems Performance Reference, IBM, SC28-1187 [REF003] Watson, Cheryl, Listserver archives can be found at: http://www.watsonwalker.com/archives.html[REF004] Watson, Cheryl, BoxScore User's Guide, Watson & Walker,Inc. [REF005] Web site for LSPR Description and Numbers - http://www.s390.ibm.com/lspr/lspr.html To the Top |
|