design of experiments

The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with true experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in whichnatural conditions that influence the variation are selected for observation.

In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is reflected in a variable called the predictor. The change in the predictor is generally hypothesized to result in a change in the second variable, hence called the outcome variable. Experimental design involves not only the selection of suitable predictors and outcomes, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources.

Main concerns in experimental design include the establishment of validity, reliability, and replicability. For example, these concerns can be partially addressed by carefully choosing the predictor, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and sensitivity.

Correctly designed experiments advance knowledge in the natural and social sciences and engineering. Other applications include marketing and policy making.

Design of Experiments (DOE)

Outline

  1. Introduction
  2. Preparation
  3. Components of Experimental Design
  4. Purpose of Experimentation
  5. Design Guidelines
  6. Design Process
  7. One Factor Experiments
  8. Multi-factor Experiments
  9. Taguchi Methods

In the design of experiments, optimal designs (or optimum designs[2]) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith.[3][4]

In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and withminimum variance. A non-optimal design requires a greater number of experimental runs to estimate the parameters with the sameprecision as an optimal design. In practical terms, optimal experiments can reduce the costs of experimentation.

The optimality of a design depends on the statistical model and is assessed with respect to a statistical criterion, which is related to the variance-matrix of the estimator. Specifying an appropriate model and specifying a suitable criterion function both require understanding ofstatistical theory and practical knowledge with designing experiments.

three-point estimation

The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information. While the distribution used for the approximation might be a normal distribution, this is not always so and, for example a triangular distribution might be used, depending on the application.,[1]

In three-point estimation, three figures are produced initially for every distribution that is required, based on prior experience or best-guesses:

  • a = the best-case estimate
  • m = the most likely estimate
  • b = the worst-case estimate.

These are then combined to yield either a full probability distribution, for later combination with distributions obtained similarly for other variables, or summary descriptors of the distribution, such as the mean, standard deviation or percentage points of the distribution. The accuracy attributed to the results derived can be no better than the accuracy inherent in the 3 initial points, and there are clear dangers in using an assumed form for an underlying distribution that itself has little basis.

Based on the assumption (possibly unwarranted) that a doubletriangular distribution governs the data, several estimates are possible. These values are used to calculate an E value for the estimate and a standard deviation (SD) as L-estimators, where:

E = (a + 4m + b) / 6
SD = (b − a) / 6

E is a weighted average which takes into account both the most optimistic and most pessimistic estimates provided. SD measures the variability or uncertainty in the estimate. In Project Evaluation and Review Techniques (PERT) the three values are used to fit a Beta distribution for Monte Carlo simulations.

The triangular distribution is also commonly used. It differs from the double-triangular by its simple triangular shape and the mode does not have to coincide with the median. The mean (expectation) is then:

E = (a + m + b) / 3.

In some applications,[1] the triangular distribution is used directly as an estimated probability distribution, rather than for the derivation of estimated statistics.