Comment Notes from Excel File: Microbial Count Probability.xls
Current version: February 18, 2013
Introduction: Industrial records of microbial counts are notorious for their irregular pattern. Traditionally, the count of the total population (SPC or count of a specific organism or type, e.g., Salmonella, coliforms, thermophiles, anaerobes, etc.) is plotted on a chart in sequential order. The result is a jagged, apparently chaotic time series. Occasionally there can be an unusually high count which has no apparent reason. It has previously been suggested that the probability or future frequency of such an odd high count can be estimated from the past distribution of the fluctuating counts if it can be safely assumed that: 1) the fluctuations are random, 2) the series has no trend and 3) the general sanitary and other conditions that may affect the counts remain about the same. If these conditions are satisfied, and if the counts' distribution can be assumed to be either normal (symmetric) or lognormal (skewed to the right), at least as an approximation, then the probability of a future count exceeding any given threshold can be estimated by the procedure used in this worksheet. (More on the method and the principles on which it is based can be found in the References list.)
Auto-Update: Use it for quick results.
1) Select the 448 example Count data values in column H (cells H2 to H121) and choose 'Clear> Contents' from the 'Edit' menu.
2) Manually enter new Count data directly into column H or paste data copied from another source into column H starting at cell H2 using the 'Paste Special...Values' command on the 'Edit' menu.
3) Enter up to 5 Threshold values applicable to your own data into cells E2, E4, E6, E8 & E10.
4) Click the 'Update Worksheet' button to run the AutoUpdate macro.
5) Compare the Correlation Coefficient of the Count data (cell C8) with the Correlation Coefficient of the Ln(Count) data (cell D8). Choose the distribution (Normal or Lognormal) whose Correlation Coefficient is closest to 1. If neither Correlation Coefficient is close to 1, do not use this program. Verify your choice visually by looking at the two Quantile-Quantile Plots. The chosen distribution should be the one whose Q-Q plotted data most closely resemble a straight line.
6) For the chosen distribution, columns F (Normal) and G (Lognormal) contain probabilities (expressed in %) that the Thresholds listed in column E will be exceeded if conditions (and hence the Counts' distribution) remain unchanged.
Note: This worksheet applies only to Count data that are nomally or lognormally distributed. There are other formal statistical tests for normality not included here. For more details, read the 9-step Manual Update procedure described below. For additional background information, consult the References.
Manual Update: Detailed 9-step instructions to manually update affected cells and charts in the worksheet when the Count data have changed. The AutoUpdate macro performs all these steps automatically.
1) List the experimental data in sequence and plot them.
a) Clear the example Count data values from column H (labeled 'Count (C)' and colored rose) by selecting them and choosing 'Clear> Contents' from the 'Edit' menu.
b) Enter your microbial Count data directly into column H or copy your data (1 column of numeric ASCII text values, 1 measurement per line, no gaps) from another source and choose 'Paste Special...Values' from the 'Edit' menu to paste them into the 'Count (C)' column starting at cell H2. The number (up to 16384) of microbial Count measurements that you pasted is shown in cell B2. If you pasted more than 16384 values you must change the 16385 in the formula in cell B2 to refer to a row number that is not less than the number of the row containing the last value that was pasted.
c) The Count data are plotted against their index numbers (i) on the line chart called 'Food Product #1 SPC'. Select the title and change it to refer to your own data.
d) Click on the chart and choose 'Source Data...' from the 'Chart' menu. Click on the 'Series' tab. Be sure the 'Values:' field specifies every row of Count data that you want to appear on the chart by setting the last row of the 'Values:' field (122 in the example chart) to a row that is 1 greater than the last row containing a Count data value (because row 2, not row 1, contains the first data point and we want to be sure the last point appears on the chart).
2) The natural logarithms of the Count data in column H are automatically calculated in column M (colored light yellow).
3) Plot a histogram bar chart of the Count data and of their natural logarithms. The cells related to the two histograms are in rows 161 through 181 of columns A through G, just below the two histogram bar charts. Click on the Count Histogram bar chart and choose 'Source Data...' from the 'Chart' menu. Click on the 'Series' tab. B167:B173 are the Category (X) cells from the 'Bin Max Value' column and D167:D173 are the Values (Y) cells from the 'Frequency (%)' column used to draw the chart. Modify the last row (173 in the example) to be 166+# of Histogram Bins shown in cell B4. E167:E173 are the Category (X) cells and G167:G173 are the Values (Y) cells used in the Ln(Count) Histogram bar chart. Modify those two last row references as you did for the Count Histogram bar chart. (See reference 7 for more information on deciding how many histogram bins to use.)
4) The arithmetic mean (AVERAGE), standard deviation (STDEV), skewness (SKEW) and correlation coefficient (CORREL) of the Count and Ln(Count) data are updated automatically. These are calculated in cells C2, C4, C6 & C8 (for the Count data) and D2, D4, D6 & D8 (for the Ln(Count) data).
5) Calculate the autocorrelation function of the data and plot it for lags from 1 to up to 25. The absence of any significant autocorrelation is an indication that the Counts are really independent. The method will most probably work satisfactorily even if "mild" autocorrelations exist, but the reliability of the estimates may be affected.
(See References. Also, see http://www.itl.nist.gov/div898/handbook/main.html for more about autocorrelation and autocorrelation plots.)
a) Non-empty white colored cells apply to autocorrelation calculations.
b) If the number of Lags (shown in cell B9) is less than the allowed maximum (25), click on the Autocorrelation Function chart and choose 'Source Data...' from the 'Chart' menu. Click on the 'Series' tab and set the last referenced row (initially 26) to be 1 greater than the value shown in cell B9.
c) Columns M through column (Q+# of Lags) must contain the same number of rows as column H. Clear the excess rows from those columns if they contain more rows than column H. If they contain fewer rows, copy and paste the last row from each column into the number of new rows needed to make each column the same length as column H.
6) Calculate the needed values for and plot a Quantile-Quantile (Q-Q) Plot of the Count and Ln(Count) data. Column I contains the Count data from column K and column J contains the Ln(Count) data from column M but columns I and J are also sorted into ascending order. Column N contains the index number (i) of each data value. Column O contains the inverse of the standard normal cumulative distribution of (i-0.5)/n, where mean=0, standard deviation=1 and n is the # of Count data values (cell B2). The Normal Probability Q-Q Plot plots column I (Sorted Count) vs column O and the Lognormal Probability Q-Q Plot plots column J (Sorted Ln(Count)) vs column O. In order to compare the two Q-Q plots their y-axes must be the same length relative to the data that are plotted on them. Double-click on the chart's y-axis and set the y-axis Scale Minimum and y-axis Scale Maximum of the normal Q-Q Plot to the Min(Count) value (cell B163) and Max(Count) value (cell B165), respectively. Set the y-axis Scale Minimum and y-axis Scale Maximum of the lognormal Q-Q Plot to the Min(Ln(Count)) value (cell E163) and Max(Ln(Count)) value (cell E165), respectively.
7) Specify up to 5 Threshold Count values considered excessive for the organism(s) being counted. Five different example Thresholds are located in cells E2, E4, E6, E8 & E10. Replace them with Threshold values that apply to your own data.
8) Choose the distribution that matches your data.
1) If the Count Histogram has a symmetric bell shape, your data probably have a normal distribution. If the Ln(Count) Histogram has a symmetric bell shape, your data probably have a lognormal distribution.
2) Compare the Quantile-Quantile Plot of the Count data (normal distribution) with the Q-Q plot of the Ln(Count) data (lognormal distribution). Choose the distribution whose Q-Q Plot looks most like a straight line.
Quantitatively: Compare the correlation coefficient of the Count data (cell C8) with the correlation coefficient of the Ln(Count) data (cell D8). Choose the distribution whose correlation coefficient is closest to one. If neither correlation coefficient is close to one, do not use this program.
(See Reference 6 for more about the application of alternative distribution functions. See http://www.itl.nist.gov/div898/handbook/main.html for more about skewness. There are other formal statistical tests for normality that are not included in this worksheet. For example, see Reference 8, http://www.analyse-it.com/anderson-darling-normality-test_y.htm or the link just mentioned above. Also, note that records with a large number of data points are likely to fail a formal test for normality or lognormality.)
9) The worksheet calculates the probability (expressed in %) that the Threshold(s) specified in step 7 will be exceeded if conditions (and hence the Counts' distribution) remain unchanged. Calculations in cells F2, F4, F6, F8 & F10 assume the Count data have a normal distribution. Calculations in cells G2, G4, G6, G8 & G10 assume the Count data have a lognormal distribution. Note that the results are only rough estimates of the order of magnitude of the probabilities of Counts exceeding the specified Thresholds.
** WARNING **
*** Note that the probability results DO NOT INDICATE WHEN a Count exceeding a Threshold will occur! ***
Peleg, M. and Horowitz, J. 2000. On estimating the probability of aperiodic outbursts of microbial populations from their fluctuating counts. Bulletin Mathematical. Biology 62:17-35.
Peleg, M., Nussinovitch, A. and Horowitz, J. 2000. Interpretation and extraction of useful information from irregular fluctuating industrial microbial counts. J. Food Science 65:740-747.
Nussinovitch, A. and Peleg, M. 2000. Analysis of the fluctuating patterns of microbial counts in frozen industrial food products. Food Research International 33:53-62.
Nussinovitch, A., Curasso, Y. and Peleg, M. 2000. Analysis of the fluctuating microbial counts in commercial raw milk - A case study. J. Food Protection 63:1240-1247.
Horowitz, J., Normand, M.D. and Peleg, M. 1999. On modeling the irregular fluctuations in microbial counts. Critical Reviews Food Science Nutrition 39(6):503-517.
Corradini, M.G., Normand, M.D., Nussinovitch A., Horowitz, J. and Peleg, M. 2001. Estimating the frequency of high microbial counts in industrial food products using various distribution functions. J. Food Prot. 65(3):674-681.
Sturges, H. 1926. The Choice of a Class Interval, Journal of the American Statistical Association 21:65-66. (Thanks to Walter A. Rosenkrantz)
Rosenkrantz, W.A. 2000. Confidence bands for quantile functions: A parametric and graphic alternative for testing goodness of fit. The American Statistician 54(3):185-190.
Micha Peleg Tel: 413-545-5852
Professor Fax: 413-545-1262
Mark D. Normand Tel: 413-545-2365
EDP Programmer II Fax: 413-545-1262
Both located at:
Department of Food Science
University of Massachusetts
Amherst, MA 01003-1410
Worksheet on the Web at:
Last modified: February 18, 2013
The research on which this worksheet is based was supported by the USDA NRI (Food Safety) Program. The authors express their gratitude to the USDA for its support and to Prof. Walter A. Rosenkrantz for his helpful technical assistance.
Content last updated: February 19, 2013