# Glossary

Check out our handy glossary where we break down all the fancy lingo from Data, AI and ML for you!

## A-D

### Approximate Bayesian Computation

Approximate Bayesian computation (ABC) is a statistical method used to estimate the parameters of complex models when the likelihood function of the model is intractable or computationally expensive to evaluate. The basic idea behind ABC is to approximate the true posterior distribution using simulated data that is generated from the model with a set of proposed parameter values. The proposed parameter values are accepted or rejected based on how similar the simulated data is to the observed data.

The ABC method can be thought of as a way to bypass the computationally expensive step of evaluating the likelihood function of the model. Instead, the method focuses on comparing the observed data to simulated data generated from the model with different parameter values. The parameter values that generate simulated data that is most similar to the observed data are considered to be the most likely values of the true parameters.

The method is useful in situations where the likelihood function is complex and intractable, such as in models with latent variables or models with high-dimensional parameter spaces. It has been used in a variety of fields including genetics, epidemiology, and ecology to estimate the parameters of models with complex dynamics.

### ARIMA

ARIMA (AutoRegressive Integrated Moving Average) is a type of statistical model for time series data. It is a combination of three components:

- Autoregression (AR): a model that uses the past values of the time series to predict the current value.
- Integration (I): a model that accounts for non-stationarity in the time series by differencing the data, in order to remove trends or seasonality.
- Moving average (MA): a model that uses past errors or residuals of the time series to predict the current value.

By combining these three components, an ARIMA model can account for both linear and nonlinear relationships in the data, and can handle a wide range of different time series patterns such as trend, seasonality and noise.

The ARIMA model is specified by three parameters: (p,d,q) where p is the order of the autoregression component, d is the order of the differencing component, and q is the order of the moving average component.

To use an ARIMA model, one must first determine the values of (p,d,q) that best fit the data and then use the estimated parameters to make predictions about future values of the time series.

ARIMA models are widely used in many fields such as economics, finance, and engineering. They are considered as a powerful tool for time series forecasting, but also require a certain level of expertise and experience to use effectively.

### Auto-encoder

An autoencoder is a type of neural network that is trained to reconstruct its input. It consists of two main components: an encoder and a decoder. The encoder maps the input data to a lower-dimensional representation, called the bottleneck or latent representation. The decoder then maps this latent representation back to the original input space. The goal of training an autoencoder is to minimize the difference between the original input and the output of the decoder.

One main functionality of autoencoder is dimensionality reduction, where the encoder learns a compact representation of the input data, which can be used for tasks such as data compression, feature extraction, and anomaly detection.

Another functionality is generative modeling, where the decoder can be used to generate new data that is similar to the training data.

Additionally, autoencoder can be used for denoising by adding noise to the input data and then training the autoencoder to reconstruct the original, clean data from the noisy input.

Overall, autoencoder is a powerful tool for unsupervised learning tasks, as it can learn useful representations of the data without the need for labeled examples.

### Bayesian statistics

Bayesian statistics is a branch of statistics that is based on the Bayesian interpretation of probability, which views probability as a measure of degree of belief in an event. In Bayesian statistics, the probability of an event is updated as new data is observed. This is in contrast to classical statistics, where the probability of an event is considered fixed and does not change as new data is observed.

One of the key concepts in Bayesian statistics is the use of prior distributions, which are probability distributions that represent our initial beliefs about the parameters of a model before any data is observed. As new data is observed, the prior distributions are updated using Bayes' theorem to create posterior distributions, which represent our updated beliefs about the parameters of the model.

Bayesian statistics is used in a wide range of fields, such as economics, engineering, and the natural and social sciences. It is particularly useful in problems where the number of parameters is large or the data is noisy, and it can also be used in cases where the underlying probability distributions are not well-understood or are difficult to specify.

### Bayesian Linear Regression

Bayesian linear regression is a type of linear regression analysis in which the statistical analysis is undertaken within the context of Bayesian inference. In this approach, the parameters of the linear regression model are treated as random variables, and prior distributions are placed on these parameters. The data is then used to update the prior distributions, leading to posterior distributions for the parameters. These posterior distributions can be used to make probabilistic predictions about the response variable, rather than just point estimates. This allows for the quantification of uncertainty in the model and the ability to make predictions that take into account this uncertainty.

### Bayes' theorem

Bayes' theorem is a fundamental result in probability theory that relates the probability of an event to the conditional probability of that event given some other event. It is named after the Reverend Thomas Bayes, an 18th-century statistician and theologian. The theorem is stated as follows:

P(A|B) = P(B|A) * P(A) / P(B)

where:

- P(A|B) is the conditional probability of event A given that event B has occurred, also known as the posterior probability.
- P(B|A) is the conditional probability of event B given that event A has occurred, also known as the likelihood.
- P(A) is the prior probability of event A, which represents our initial belief about the probability of A before any data is observed.
- P(B) is the marginal probability of event B, which is the sum or integral of the likelihood over all possible values of A.

The theorem states that the posterior probability is proportional to the product of the likelihood and the prior probability, and that this proportionality is determined by the marginal probability of the data.

It is important to notice that Bayes theorem is a fundamental rule of probability theory and it is used as a way to update our belief about an event, given some new data. It is used in many fields, such as statistics, machine learning, artificial intelligence, natural language processing, and many more.

### Bayesian inference

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayes' theorem describes the probability of a hypothesis (H) given some evidence (E), P(H|E), in terms of prior probability of the hypothesis (P(H)) and the likelihood of the evidence (P(E|H)) given the hypothesis. In Bayesian inference, the prior probability is updated as new evidence is considered, and the updated probability is called the posterior probability. The process of updating the prior probability in light of new evidence is known as "Bayesian updating."

### Cox's theorem

Cox's theorem, also known as the Cox-Jaynes theorem, is a theoretical result in probability theory that relates the concept of probability to the concept of logic. It states that any coherent system of probability assignments can be derived from a set of plausible reasoning rules called "coherence postulates."

The coherence postulates are a set of mathematical constraints that must be satisfied by any system of probability assignments in order for it to be considered "coherent." They include the requirement that probabilities be non-negative, that the sum of the probabilities of all possible outcomes be equal to 1, and that probabilities be consistent with the laws of logic.

Cox's theorem demonstrates that any probability assignment that satisfies these postulates can be derived from a set of logical inference rules that are based on the concept of plausible reasoning. This means that the concept of probability can be thought of as a generalization of logic, rather than as something that is independent of it.

This theorem also provides a mathematical foundation for the subjective interpretation of probability and it's a way to formalize the subjective probability.

It was developed by Richard T. Cox in 1946, and later expanded upon by Edwin T. Jaynes in 1957.

### Conjugate Prior

A conjugate prior is a prior probability distribution that is chosen to be in the same family as the likelihood function. When the likelihood function is in a known family, it is often possible to find a prior distribution that is in the same family, and that makes the calculation of the posterior distribution much simpler. The conjugate prior is a powerful tool for doing Bayesian inference because it allows for closed-form solutions for the posterior distribution, without the need for numerical integration.

### Croston's method

Croston's method is a forecasting technique used for intermittent demand. Intermittent demand refers to situations where the demand for a product is not consistent over time, but rather occurs in sporadic bursts. Examples of products that exhibit intermittent demand include seasonal items, fashion products, and products used in maintenance or repair.

Croston's method is a two-step process. First, it estimates the average rate at which demand occurs, and second, it estimates the average size of the demand bursts. The method uses historical data to estimate these two parameters and then uses them to make forecasts.

The method uses two different equations to estimate the average rate and average size of demand bursts. The first equation estimates the average rate of demand, denoted as "a", as the ratio of the total demand over a certain period to the number of demand occurrences in that period. The second equation estimates the average size of demand bursts, denoted as "b", as the ratio of the total demand over a certain period to the number of periods with non-zero demand.

Once the average rate and average size of demand bursts are estimated, Croston's method can be used to generate forecasts for future periods. The method forecasts the probability of demand occurring in the next period and the size of the demand burst if it does occur.

Croston's method has been shown to be effective in forecasting intermittent demand and is considered as an alternative to traditional forecasting methods such as moving averages or exponential smoothing which may not be effective with this type of demand.

### Demand Forecasting

Demand forecasting is the process of using historical data and other factors to predict future demand for a product or service. In the supply chain management, demand forecasting can be used to optimize inventory levels, plan production schedules, and make informed decisions about pricing and promotions. With accurate demand forecasting, businesses can reduce waste and improve customer satisfaction by having the right products in stock at the right time.

## E-H

### Fourier Transformation

A Fourier transformation is a mathematical technique that is used to transform a function of time into a function of frequency. This allows one to analyze the frequency components of a signal, such as a sound or image, and to manipulate or filter those components as needed. The Fourier transform is a powerful tool in many areas of science and engineering, including signal processing, image processing, and telecommunications.

### Fast Fourier Transform

The Fast Fourier Transform (FFT) is an efficient algorithm for computing the discrete Fourier transform (DFT) of a sequence, or its inverse. The DFT is a way of representing a signal in the frequency domain, by expressing it as a sum of complex exponentials at different frequencies. The FFT algorithm reduces the computational complexity of the DFT from O(n^2) to O(n log n), making it much faster for large sequences. This makes it a very useful tool in many applications where the DFT is needed, such as signal and image processing, and scientific computing. There are several different algorithms that can be used to compute the FFT, but the most common one is the Cooley-Tukey algorithm.

## I-L

## M-P

### Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is a method for sampling from a high-dimensional distribution, particularly when direct sampling is difficult or impossible. It works by constructing a Markov Chain, a sequence of samples that are generated such that the next sample is dependent only on the current state, and not on the previous states. By running this chain for a large number of steps, the samples generated will converge to a sample from the target distribution.

MCMC methods are used to approximate complex multi-dimensional distributions by generating samples from it. The most popular MCMC methods are Metropolis-Hastings and the Gibbs sampler.

Metropolis-Hastings algorithm generates a proposal sample based on the current sample and the proposal distribution. The proposal sample is then accepted or rejected based on its likelihood and the acceptance probability.

The Gibbs sampler is a special case of the Metropolis-Hastings algorithm that generates a new sample by sampling from the conditional distributions of each variable given the current values of the other variables.

Both methods are used to explore the state space of a target distribution and estimate its properties by creating a large number of samples from it.

### ML (Machine Learning)

Machine Learning (ML) is a subset of AI that deals with the development of algorithms and statistical models that enable machines to improve their performance with experience. In the supply chain and forecasting, ML can be used for demand forecasting, inventory optimization, and anomaly detection. For example, a machine learning model can analyze historical sales data and factor in inventory levels to predict future demand and make intelligent stocking decisions.

## Q-T

### Strong Prior

A strong prior, also known as an informative prior, is a probability distribution that contains a significant amount of prior information or assumptions about the true value of a parameter. It is used in Bayesian inference to incorporate existing knowledge or expert opinion into the analysis.

A strong prior can be specified in a variety of forms, such as a point estimate, a specific distribution, or a set of constraints on the parameter. It can be based on previous research, external data, or expert judgment.

For example, if a researcher has previous research that suggests that a parameter of a model is around a specific value, they can use that value as a point estimate for the prior. Similarly, if they have an external dataset that gives a rough estimate of the parameter value, they can use that dataset to estimate the prior.

When a strong prior is used, it can have a significant impact on the final result of the analysis, as it can constrain the possible values of the parameter and reduce the uncertainty of the posterior distribution.

It's important to keep in mind that strong priors can introduce bias into the analysis if the prior information is not accurate or relevant. Therefore, it's important to choose a prior that is well-justified and supported by the available evidence.

## U-Z

### Uninformative Priors

Uninformative priors, also known as objective or non-informative priors, are probability distributions that do not contain any prior information or assumptions about the true value of a parameter. They are used in Bayesian inference to avoid introducing any bias or preconceptions into the analysis.

There are different ways to construct uninformative priors, depending on the specific problem and the type of data. Some common examples include:

- The improper prior, such as a flat or uniform distribution over the entire parameter space, which assigns equal probability to all possible values of the parameter.
- The Jeffreys prior, which is the square root of the determinant of the Fisher information matrix, and is derived from the asymptotic properties of the maximum likelihood estimator.
- The reference prior, which is derived from the symmetries and invariance properties of the likelihood function, and is intended to provide a neutral starting point for the analysis.

The main idea behind uninformative priors is that they should not influence the final result of the analysis, and should only act as a normalizing constant. The data should be the only source of information used to infer the parameter of interest.

It's worth noting that, even though the term is "uniformative", it doesn't mean that the prior is always a uniform distribution, it's a term used to indicate that the prior doesn't contain any useful information about the parameter of interest, other than the constraint that it is a probability distribution.

### XGBoost

XGBoost is an open-source software library for gradient boosting on decision trees. It is an implementation of the gradient boosting framework for distributed and parallel computing. XGBoost is known for its high performance and efficiency when it comes to machine learning tasks such as classification and regression. It is particularly popular in Kaggle competitions and has been used to win many of them. The main features of XGBoost include its ability to handle missing values, support for parallel and distributed computing, and built-in regularization which helps to prevent overfitting. It also have many hyperparameters which can be tuned to improve the performance and to prevent overfitting.