Title: | Evaluation of Tweedie Exponential Family Models |
---|---|
Description: | Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi:10.1007/s11222-005-4070-y>) and the Fourier inversion (Dunn and Smyth, 2008; <doi:10.1007/s11222-007-9039-6>), and related methods. |
Authors: | Peter K. Dunn [cre, aut] |
Maintainer: | Peter K. Dunn <[email protected]> |
License: | GPL (>=2) |
Version: | 2.3.5 |
Built: | 2024-10-24 05:44:04 UTC |
Source: | https://github.com/peterkdunn/tweedie |
Functions for computing and fitting the Tweedie family of distributions
Package: | tweedie |
Type: | Package |
Version: | 2.3.2 |
Date: | 2017-12-14 |
License: | GPL (>=2) |
Peter K Dunn
Maintainer: Peter K Dunn <[email protected]>
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579–604. Calcutta: Indian Statistical Institute.
# Generate random numbers set.seed(987654) y <- rtweedie( 20, xi=1.5, mu=1, phi=1) # With Tweedie index xi between 1 and 2, this produces continuous # data with exact zeros x <- rnorm( length(y), 0, 1) # Unrelated predictor # With exact zeros, Tweedie index xi must be between 1 and 2 # Fit the tweedie distribution; expect xi about 1.5 library(statmod) xi.vec <- seq(1.1, 1.9, by=0.5) out <- tweedie.profile( y~1, xi.vec=xi.vec, do.plot=TRUE, verbose=TRUE) # Fit the glm require(statmod) # Provides tweedie family functions summary(glm( y ~ x, family=tweedie(var.power=out$xi.max, link.power=0) ))
# Generate random numbers set.seed(987654) y <- rtweedie( 20, xi=1.5, mu=1, phi=1) # With Tweedie index xi between 1 and 2, this produces continuous # data with exact zeros x <- rnorm( length(y), 0, 1) # Unrelated predictor # With exact zeros, Tweedie index xi must be between 1 and 2 # Fit the tweedie distribution; expect xi about 1.5 library(statmod) xi.vec <- seq(1.1, 1.9, by=0.5) out <- tweedie.profile( y~1, xi.vec=xi.vec, do.plot=TRUE, verbose=TRUE) # Fit the glm require(statmod) # Provides tweedie family functions summary(glm( y ~ x, family=tweedie(var.power=out$xi.max, link.power=0) ))
The AIC for Tweedie glms
AICtweedie( glm.obj, dispersion=NULL, k = 2, verbose=TRUE)
AICtweedie( glm.obj, dispersion=NULL, k = 2, verbose=TRUE)
glm.obj |
a fitted Tweedie |
dispersion |
the dispersion parameter |
k |
numeric: the penalty per parameter to be used; the default is |
verbose |
if |
See AIC
for more details on the AIC;
see dtweedie
for more details on computing the Tweedie densities
Returns a numeric value with the
corresponding AIC (or BIC, depending on )
Computing the AIC may take a long time.
Tweedie distributions with the index parameter as 1
correspond to Poisson distributions when .
However,
in general a Tweedie distribution with an index parameter equal to one
may not be referring to a Poisson distribution with
,
so we cannot assume that
just because the index parameter is set to one.
If the Poisson distribution is intended,
then
dispersion=1
should be specified.
The same argument applies for similar situations.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
library(statmod) # Needed to use tweedie family object ### Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) ### Fit a Tweedie glm and find the AIC m1 <- glm( test.data~1, family=tweedie(link.power=0, var.power=2) ) ### A Tweedie glm with p=2 is equivalent to a gamma glm: m2 <- glm( test.data~1, family=Gamma(link=log)) ### The models are equivalent, so the AIC shoud be the same: AICtweedie(m1) AIC(m2)
library(statmod) # Needed to use tweedie family object ### Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) ### Fit a Tweedie glm and find the AIC m1 <- glm( test.data~1, family=tweedie(link.power=0, var.power=2) ) ### A Tweedie glm with p=2 is equivalent to a gamma glm: m2 <- glm( test.data~1, family=Gamma(link=log)) ### The models are equivalent, so the AIC shoud be the same: AICtweedie(m1) AIC(m2)
Derivatives of the log-likelihood with respect to
dtweedie.dldphi(phi, mu, power, y ) dtweedie.dldphi.saddle(phi, mu, power, y )
dtweedie.dldphi(phi, mu, power, y ) dtweedie.dldphi.saddle(phi, mu, power, y )
y |
vector of quantiles |
mu |
the mean |
phi |
the dispersion |
power |
the value of |
The Tweedie family of distributions belong to the class
of exponential dispersion models (EDMs),
famous for their role in generalized linear models.
The Tweedie distributions are the EDMs with a variance of the form
where
is greater than or equal to one, or less than or equal to zero.
This function only evaluates for
greater than or equal to one.
Special cases include the
normal (
),
Poisson (
with
),
gamma (
)
and
inverse Gaussian (
)
distributions.
For other values of
power
,
the distributions are still defined but cannot be written in closed form,
and hence evaluation is very difficult.
the value of the derivative
where
is the log-likelihood for the specified
Tweedie distribution.
dtweedie.dldphi.saddle
uses the saddlepoint approximation to determine the derivative;
dtweedie.dldphi
uses an infinite series expansion.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Sidi, Avram (1982). The numerical evaluation of very oscillatory infinite integrals by extrapolation. Mathematics of Computation 38(158), 517–529. doi:10.1090/S0025-5718-1982-0645667-5
Sidi, Avram (1988). A user-friendly extrapolation method for oscillatory infinite integrals. Mathematics of Computation 51(183), 249–266. doi:10.1090/S0025-5718-1988-0942153-5
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
dtweedie.saddle
,
dtweedie
,
tweedie.profile
,
tweedie
### Plot dl/dphi against candidate values of phi power <- 2 mu <- 1 phi <- seq(2, 8, by=0.1) set.seed(10000) # For reproducability y <- rtweedie( 100, mu=mu, power=power, phi=3) # So we expect the maximum to occur at phi=3 dldphi <- dldphi.saddle <- array( dim=length(phi)) for (i in (1:length(phi))) { dldphi[i] <- dtweedie.dldphi( y=y, power=power, mu=mu, phi=phi[i]) dldphi.saddle[i] <- dtweedie.dldphi.saddle( y=y, power=power, mu=mu, phi=phi[i]) } plot( dldphi ~ phi, lwd=2, type="l", ylab=expression(phi), xlab=expression(paste("dl / d",phi) ) ) lines( dldphi.saddle ~ phi, lwd=2, col=2, lty=2) legend( "bottomright", lwd=c(2,2), lty=c(1,2), col=c(1,2), legend=c("'Exact' (using series)","Saddlepoint") ) # Neither are very good in this case!
### Plot dl/dphi against candidate values of phi power <- 2 mu <- 1 phi <- seq(2, 8, by=0.1) set.seed(10000) # For reproducability y <- rtweedie( 100, mu=mu, power=power, phi=3) # So we expect the maximum to occur at phi=3 dldphi <- dldphi.saddle <- array( dim=length(phi)) for (i in (1:length(phi))) { dldphi[i] <- dtweedie.dldphi( y=y, power=power, mu=mu, phi=phi[i]) dldphi.saddle[i] <- dtweedie.dldphi.saddle( y=y, power=power, mu=mu, phi=phi[i]) } plot( dldphi ~ phi, lwd=2, type="l", ylab=expression(phi), xlab=expression(paste("dl / d",phi) ) ) lines( dldphi.saddle ~ phi, lwd=2, col=2, lty=2) legend( "bottomright", lwd=c(2,2), lty=c(1,2), col=c(1,2), legend=c("'Exact' (using series)","Saddlepoint") ) # Neither are very good in this case!
Saddlepoint density for the Tweedie distributions
dtweedie.saddle(y, xi=NULL, mu, phi, eps=1/6, power=NULL)
dtweedie.saddle(y, xi=NULL, mu, phi, eps=1/6, power=NULL)
y |
the vector of responses |
xi |
the value of |
power |
a synonym for |
mu |
the mean |
phi |
the dispersion |
eps |
the offset in computing the variance function.
The default is |
The Tweedie family of distributions belong to the class
of exponential dispersion models (EDMs),
famous for their role in generalized linear models.
The Tweedie distributions are the EDMs with a variance of the form
where
is greater than or equal to one, or less than or equal to zero.
This function only evaluates for
greater than or equal to one.
Special cases include the
normal (
),
Poisson (
with
),
gamma (
)
and
inverse Gaussian (
)
distributions.
For other values of
power
,
the distributions are still defined but cannot be written in closed form,
and hence evaluation is very difficult.
When ,
the distribution are continuous for
greater than zero,
with a positive mass at
.
For
,
the distributions are continuous for
greater than zero.
This function approximates the density using the saddlepoint approximation defined by Nelder and Pregibon (1987).
saddlepoint (approximate) density
for the given Tweedie distribution with parameters
mu
,
phi
and
power
.
Peter Dunn ([email protected])
Daniels, H. E. (1954). Saddlepoint approximations in statistics. Annals of Mathematical Statistics, 25(4), 631–650.
Daniels, H. E. (1980). Exact saddlepoint approximations. Biometrika, 67, 59–63. doi:10.1093/biomet/67.1.59
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127-162.
Jorgensen, B. (1997). Theory of Dispersion Models, Chapman and Hall, London.
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika, 74(2), 221–232. doi:10.1093/biomet/74.2.221
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
p <- 2.5 mu <- 1 phi <- 1 y <- seq(0, 10, length=100) fy <- dtweedie( y=y, power=p, mu=mu, phi=phi) plot(y, fy, type="l") # Compare to the saddlepoint density f.saddle <- dtweedie.saddle( y=y, power=p, mu=mu, phi=phi) lines( y, f.saddle, col=2 )
p <- 2.5 mu <- 1 phi <- 1 y <- seq(0, 10, length=100) fy <- dtweedie( y=y, power=p, mu=mu, phi=phi) plot(y, fy, type="l") # Compare to the saddlepoint density f.saddle <- dtweedie.saddle( y=y, power=p, mu=mu, phi=phi) lines( y, f.saddle, col=2 )
The log likelihood for Tweedie models
logLiktweedie( glm.obj, dispersion=NULL)
logLiktweedie( glm.obj, dispersion=NULL)
glm.obj |
a fitted Tweedie |
dispersion |
the dispersion parameter |
The log-likelihood is computed from the AIC,
so see AICtweedie
for more details.
Returns the log-likelihood from the specified model
Computing the log-likelihood may take a long time.
Tweedie distributions with the index parameter as 1
correspond to Poisson distributions when .
However,
in general a Tweedie distribution with an index parameter equal to one
may not be referring to a Poisson distribution with
,
so we cannot assume that
just because the index parameter is set to one.
If the Poisson distribution is intended,
then
dispersion=1
should be specified.
The same argument applies for similar situations.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
library(statmod) # Needed to use tweedie family object ### Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) ### Fit a Tweedie glm and find the AIC m1 <- glm( test.data~1, family=tweedie(link.power=0, var.power=2) ) ### A Tweedie glm with p=2 is equivalent to a gamma glm: m2 <- glm( test.data~1, family=Gamma(link=log)) ### The models are equivalent, so the AIC shoud be the same: logLiktweedie(m1) logLik(m2)
library(statmod) # Needed to use tweedie family object ### Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) ### Fit a Tweedie glm and find the AIC m1 <- glm( test.data~1, family=tweedie(link.power=0, var.power=2) ) ### A Tweedie glm with p=2 is equivalent to a gamma glm: m2 <- glm( test.data~1, family=Gamma(link=log)) ### The models are equivalent, so the AIC shoud be the same: logLiktweedie(m1) logLik(m2)
Density, distribution function, quantile function and random generation for the Tweedie family of distributions
dtweedie(y, xi=NULL, mu, phi, power=NULL) dtweedie.series(y, power, mu, phi) dtweedie.inversion(y, power, mu, phi, exact=TRUE, method) dtweedie.stable(y, power, mu, phi) ptweedie(q, xi=NULL, mu, phi, power=NULL) ptweedie.series(q, power, mu, phi) qtweedie(p, xi=NULL, mu, phi, power=NULL) rtweedie(n, xi=NULL, mu, phi, power=NULL)
dtweedie(y, xi=NULL, mu, phi, power=NULL) dtweedie.series(y, power, mu, phi) dtweedie.inversion(y, power, mu, phi, exact=TRUE, method) dtweedie.stable(y, power, mu, phi) ptweedie(q, xi=NULL, mu, phi, power=NULL) ptweedie.series(q, power, mu, phi) qtweedie(p, xi=NULL, mu, phi, power=NULL) rtweedie(n, xi=NULL, mu, phi, power=NULL)
y , q
|
vector of quantiles |
p |
vector of probabilities |
n |
the number of observations |
xi |
the value of |
power |
a synonym for |
mu |
the mean |
phi |
the dispersion |
exact |
logical flag;
if |
method |
either |
The Tweedie family of distributions belong to the class
of exponential dispersion models (EDMs),
famous for their role in generalized linear models.
The Tweedie distributions are the EDMs with a variance of the form
where
is greater than or equal to one, or less than or equal to zero.
This function only evaluates for
greater than or equal to one.
Special cases include the
normal (
),
Poisson (
with
),
gamma (
)
and
inverse Gaussian (
)
distributions.
For other values of
power
,
the distributions are still defined but cannot be written in closed form,
and hence evaluation is very difficult.
When ,
the distribution are continuous for
greater than zero,
with a positive mass at
.
For
,
the distributions are continuous for
greater than zero.
This function evaluates the density or cumulative probability using one of two methods, depending on the combination of parameters. One method is the evaluation of an infinite series. The second interpolates some stored values computed from a Fourier inversion technique.
The function dtweedie.inversion
evaluates the density using a Fourier series technique;
ptweedie.inversion
does likewise for the cumulative
probabilities.
The actual code is contained in an external FORTRAN program.
Different code is used for
and for
.
The function dtweedie.series
evaluates the density
using a series expansion;
a different series expansion
is used for and for
.
The function
ptweedie.series
does likewise for the
cumulative probabilities but only for .
The function dtweedie.stable
exploits the link between
the stable distribution (Nolan, 1997) and Tweedie distributions,
as discussed in Jorgensen, Chapter 4.
These are computed using Nolan's algorithm as implemented
in the stabledist
package (which is therefore required to use
the dtweedie.stable
function).
The function dtweedie
uses a two-dimensional interpolation procedure to
compute the density for some parts of the parameter space from
previously computed values found from the series or the
inversion. For other parts of the parameter space,
the series solution is found.
ptweedie
returns either the computed series
solution or inversion solution.
density (dtweedie
),
probability (ptweedie
),
quantile (qtweedie
)
or random sample (rtweedie
)
for the given Tweedie distribution with parameters
mu
,
phi
and
power
.
The method
s changed from version 1.4 to 1.5
(methods 1 and 2 swapped).
The methods are defined in Dunn and Smyth (2008).
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Nolan, John P (1997). Numerical calculation of stable densities and distribution functions. Communication in Statistics—Stochastic models, 13(4). 759–774. doi:10.1080/15326349708807450
Sidi, Avram (1982). The numerical evaluation of very oscillatory infinite integrals by extrapolation. Mathematics of Computation 38(158), 517–529. doi:10.1090/S0025-5718-1982-0645667-5
Sidi, Avram (1988). A user-friendly extrapolation method for oscillatory infinite integrals. Mathematics of Computation 51(183), 249–266. doi:10.1090/S0025-5718-1988-0942153-5
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
### Plot a Tweedie density power <- 2.5 mu <- 1 phi <- 1 y <- seq(0, 6, length=500) fy <- dtweedie( y=y, power=power, mu=mu, phi=phi) plot(y, fy, type="l", lwd=2, ylab="Density") # Compare to the saddlepoint density f.saddle <- dtweedie.saddle( y=y, power=power, mu=mu, phi=phi) lines( y, f.saddle, col=2 ) legend("topright", col=c(1,2), lwd=c(2,1), legend=c("Actual","Saddlepoint") ) ### A histogram of Tweedie random numbers hist( rtweedie( 1000, power=1.2, mu=1, phi=1) ) ### An example of the multimodal feature of the Tweedie ### family with power near 1 (from Dunn and Smyth, 2005). y <- seq(0.001,2,len=1000) mu <- 1 phi <- 0.1 p <- 1.02 f1 <- dtweedie(y,mu=mu,phi=phi,power=p) plot(y, f1, type="l", xlab="y", ylab="Density") p <- 1.05 f2<- dtweedie(y,mu=mu,phi=phi,power=p) lines(y,f2, col=2) ### Compare series and saddlepoint methods y <- seq(0.001,2,len=1000) mu <- 1 phi <- 0.1 p <- 1.02 f.series <- dtweedie.series( y,mu=mu,phi=phi,power=p ) f.saddle <- dtweedie.saddle( y,mu=mu,phi=phi,power=p ) f.all <- c( f.series, f.saddle ) plot( range(f.all) ~ range( y ), xlab="y", ylab="Density", type="n") lines( f.series ~ y, lty=1, col=1) lines( f.saddle ~ y, lty=3, col=3) legend("topright", lty=c(1,3), col=c(1,3), legend=c("Series","Saddlepoint") )
### Plot a Tweedie density power <- 2.5 mu <- 1 phi <- 1 y <- seq(0, 6, length=500) fy <- dtweedie( y=y, power=power, mu=mu, phi=phi) plot(y, fy, type="l", lwd=2, ylab="Density") # Compare to the saddlepoint density f.saddle <- dtweedie.saddle( y=y, power=power, mu=mu, phi=phi) lines( y, f.saddle, col=2 ) legend("topright", col=c(1,2), lwd=c(2,1), legend=c("Actual","Saddlepoint") ) ### A histogram of Tweedie random numbers hist( rtweedie( 1000, power=1.2, mu=1, phi=1) ) ### An example of the multimodal feature of the Tweedie ### family with power near 1 (from Dunn and Smyth, 2005). y <- seq(0.001,2,len=1000) mu <- 1 phi <- 0.1 p <- 1.02 f1 <- dtweedie(y,mu=mu,phi=phi,power=p) plot(y, f1, type="l", xlab="y", ylab="Density") p <- 1.05 f2<- dtweedie(y,mu=mu,phi=phi,power=p) lines(y,f2, col=2) ### Compare series and saddlepoint methods y <- seq(0.001,2,len=1000) mu <- 1 phi <- 0.1 p <- 1.02 f.series <- dtweedie.series( y,mu=mu,phi=phi,power=p ) f.saddle <- dtweedie.saddle( y,mu=mu,phi=phi,power=p ) f.all <- c( f.series, f.saddle ) plot( range(f.all) ~ range( y ), xlab="y", ylab="Density", type="n") lines( f.series ~ y, lty=1, col=1) lines( f.saddle ~ y, lty=3, col=3) legend("topright", lty=c(1,3), col=c(1,3), legend=c("Series","Saddlepoint") )
Internal tweedie functions.
dtweedie.dlogfdphi(y, mu, phi, power) dtweedie.logl(phi, y, mu, power) dtweedie.logl.saddle( phi, power, y, mu, eps=0) dtweedie.logv.bigp( y, phi, power) dtweedie.logw.smallp(y, phi, power) dtweedie.interp(grid, nx, np, xix.lo, xix.hi,p.lo, p.hi, power, xix) dtweedie.jw.smallp(y, phi, power ) dtweedie.kv.bigp(y, phi, power) dtweedie.series.bigp(power, y, mu, phi) dtweedie.series.smallp(power, y, mu, phi) stored.grids(power) twpdf(p, phi, y, mu, exact, verbose, funvalue, exitstatus, relerr, its ) twcdf(p, phi, y, mu, exact, funvalue, exitstatus, relerr, its )
dtweedie.dlogfdphi(y, mu, phi, power) dtweedie.logl(phi, y, mu, power) dtweedie.logl.saddle( phi, power, y, mu, eps=0) dtweedie.logv.bigp( y, phi, power) dtweedie.logw.smallp(y, phi, power) dtweedie.interp(grid, nx, np, xix.lo, xix.hi,p.lo, p.hi, power, xix) dtweedie.jw.smallp(y, phi, power ) dtweedie.kv.bigp(y, phi, power) dtweedie.series.bigp(power, y, mu, phi) dtweedie.series.smallp(power, y, mu, phi) stored.grids(power) twpdf(p, phi, y, mu, exact, verbose, funvalue, exitstatus, relerr, its ) twcdf(p, phi, y, mu, exact, funvalue, exitstatus, relerr, its )
y |
the vector of responses |
power |
the value of |
mu |
the mean |
phi |
the dispersion |
grid |
the interpolation grid necessary for the given value of |
nx |
the number of interpolation points in the |
np |
the number of interpolation points in the |
xix.lo |
the lower value of the transformed |
xix.hi |
the higher value of the transformed |
p.lo |
the lower value of |
p.hi |
the higher value of |
xix |
the value of the transformed |
eps |
the offset in computing the variance function in the saddlepoint approximation.
The default is |
p |
the Tweedie index parameter |
exact |
a flag for the FORTRAN to use exact-zeros acceleration algorithmic the calculation (1 means to do so) |
verbose |
a flag for the FORTRAN: 1 means to be verbose |
funvalue |
the value of the call returned by the FORTRAN code |
exitstatus |
the exit status returned by the FORTRAN code |
relerr |
an estimation of the relative error returned by the FORTRAN code |
its |
the number of iterations of the algorithm returned by the FORTRAN code |
These are not to be called by the user.
Peter Dunn ([email protected])
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function Biometrika, 74(2), 221–232. doi10.1093/biomet/74.2.221
Converts Tweedie distribution parameters to the parameters of the underlying distributions
tweedie.convert( xi=NULL, mu, phi, power=NULL)
tweedie.convert( xi=NULL, mu, phi, power=NULL)
xi |
the value of |
power |
a synonym for |
mu |
the mean |
phi |
the dispersion |
The Tweedie family of distributions with
is the Poisson sum of gamma distributions
(where the Poisson distribution has mean
,
and the gamma distribution has scale and shape parameters).
When used to fit a glm,
the model is fitted with the usual glm parameters:
the mean
and the dispersion parameter
.
This function converts the parameters
to the values of the parameters of the underlying Poisson distribution
and gamma distribution (scale and shape parameters).
a list containing the values of
the mean of the underlying Poisson distribution (as poisson.lambda
),
the scale parameter of the underlying gamma distribution (as gamma.scale
),
the shape parameter of the underlying gamma distribution (as gamma.shape
),
the probability of obtaining a zero response (as p0
),
the mean of the underlying gamma distribution (as gamma.mean
),
and
the dispersion parameter of the underlying gamma distribution (as gamma.phi
).
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
tweedie.convert(xi=1.5, mu=1, phi=1)
tweedie.convert(xi=1.5, mu=1, phi=1)
The deviance function for the Tweedie family of distributions
tweedie.dev(y, mu, power)
tweedie.dev(y, mu, power)
y |
vector of quantiles (which can be zero if |
mu |
the mean |
power |
the value of |
The Tweedie family of distributions belong to the class
of exponential dispersion models (EDMs),
famous for their role in generalized linear models.
The Tweedie distributions are the EDMs with a variance of the form
where
is greater than or equal to one, or less than or equal to zero.
This function only evaluates for
greater than or equal to one.
Special cases include the
normal (
),
Poisson (
with
),
gamma (
)
and
inverse Gaussian (
)
distributions.
For other values of
power
,
the distributions are still defined but cannot be written in closed form,
and hence evaluation is very difficult.
The deviance is defined by deviance
as
“up to a constant, minus twice the maximized log-likelihood.
Where sensible, the constant is chosen so that a saturated
model has deviance zero.”
the value of the deviance
for the given Tweedie distribution with parameters
mu
,
phi
and
power
.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Sidi, Avram (1982). The numerical evaluation of very oscillatory infinite integrals by extrapolation. Mathematics of Computation 38(158), 517–529. doi:10.1090/S0025-5718-1982-0645667-5
Sidi, Avram (1988). A user-friendly extrapolation method for oscillatory infinite integrals. Mathematics of Computation 51(183), 249–266. doi:10.1090/S0025-5718-1988-0942153-5
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
dtweedie
,
dtweedie.saddle
,
tweedie
,
deviance
,
glm
### Plot a Tweedie deviance function when 1<p<2 mu <- 1 y <- seq(0, 6, length=100) dev1 <- tweedie.dev( y=y, mu=mu, power=1.1) dev2 <- tweedie.dev( y=y, mu=mu, power=1.5) dev3 <- tweedie.dev( y=y, mu=mu, power=1.9) plot(range(y), range( c(dev1, dev2, dev3)), type="n", lwd=2, ylab="Deviance", xlab=expression(italic(y)) ) lines( y, dev1, lty=1, col=1, lwd=2 ) lines( y, dev2, lty=2, col=2, lwd=2 ) lines( y, dev3, lty=3, col=3, lwd=2 ) legend("top", col=c(1,2,3), lwd=c(2,2,2), lty=c(1,2,3), legend=c("p=1.1","p=1.5", "p=1.9") ) ### Plot a Tweedie deviance function when p>2 mu <- 1 y <- seq(0.1, 6, length=100) dev1 <- tweedie.dev( y=y, mu=mu, power=2) # Gamma dev2 <- tweedie.dev( y=y, mu=mu, power=3) # Inverse Gaussian dev3 <- tweedie.dev( y=y, mu=mu, power=4) plot(range(y), range( c(dev1, dev2, dev3)), type="n", lwd=2, ylab="Deviance", xlab=expression(italic(y)) ) lines( y, dev1, lty=1, col=1, lwd=2 ) lines( y, dev2, lty=2, col=2, lwd=2 ) lines( y, dev3, lty=3, col=3, lwd=2 ) legend("top", col=c(1,2,3), lwd=c(2,2,2), lty=c(1,2,3), legend=c("p=2 (gamma)", "p=3 (inverse Gaussian)", "p=4") )
### Plot a Tweedie deviance function when 1<p<2 mu <- 1 y <- seq(0, 6, length=100) dev1 <- tweedie.dev( y=y, mu=mu, power=1.1) dev2 <- tweedie.dev( y=y, mu=mu, power=1.5) dev3 <- tweedie.dev( y=y, mu=mu, power=1.9) plot(range(y), range( c(dev1, dev2, dev3)), type="n", lwd=2, ylab="Deviance", xlab=expression(italic(y)) ) lines( y, dev1, lty=1, col=1, lwd=2 ) lines( y, dev2, lty=2, col=2, lwd=2 ) lines( y, dev3, lty=3, col=3, lwd=2 ) legend("top", col=c(1,2,3), lwd=c(2,2,2), lty=c(1,2,3), legend=c("p=1.1","p=1.5", "p=1.9") ) ### Plot a Tweedie deviance function when p>2 mu <- 1 y <- seq(0.1, 6, length=100) dev1 <- tweedie.dev( y=y, mu=mu, power=2) # Gamma dev2 <- tweedie.dev( y=y, mu=mu, power=3) # Inverse Gaussian dev3 <- tweedie.dev( y=y, mu=mu, power=4) plot(range(y), range( c(dev1, dev2, dev3)), type="n", lwd=2, ylab="Deviance", xlab=expression(italic(y)) ) lines( y, dev1, lty=1, col=1, lwd=2 ) lines( y, dev2, lty=2, col=2, lwd=2 ) lines( y, dev3, lty=3, col=3, lwd=2 ) legend("top", col=c(1,2,3), lwd=c(2,2,2), lty=c(1,2,3), legend=c("p=2 (gamma)", "p=3 (inverse Gaussian)", "p=4") )
Plotting Tweedie density and distribution functions
tweedie.plot(y, xi, mu, phi, type="pdf", power=NULL, add=FALSE, ...)
tweedie.plot(y, xi, mu, phi, type="pdf", power=NULL, add=FALSE, ...)
y |
vector of values at which to evaluate and plot |
xi |
the value of |
power |
a synonym for |
mu |
the mean |
phi |
the dispersion |
type |
what to plot: |
add |
if |
... |
Arguments to be passed to the plotting method |
For details, see dtweedie
this function is usually called for side-effect of
producing a plot of the specified Tweedie distribution,
properly plotting the exact zero that occurs at
when
.
However,
it also produces a list with the computed density at the given points,
with components
y
and x
respectively,
such that plot(y~x)
approximately reproduces the plot.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Nolan, John P (1997). Numerical calculation of stable densities and distribution functions. Communication in Statistics—Stochastic models, 13(4). 759–774. doi:10.1080/15326349708807450
Sidi, Avram (1982). The numerical evaluation of very oscillatory infinite integrals by extrapolation. Mathematics of Computation 38(158), 517–529. doi:10.1090/S0025-5718-1982-0645667-5
Sidi, Avram (1988). A user-friendly extrapolation method for oscillatory infinite integrals. Mathematics of Computation 51(183), 249–266. doi:10.1090/S0025-5718-1988-0942153-5
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
### Plot a Tweedie density with 1<p<2 yy <- seq(0,5,length=100) tweedie.plot( power=1.7, mu=1, phi=1, y=yy, lwd=2) tweedie.plot( power=1.2, mu=1, phi=1, y=yy, add=TRUE, lwd=2, col="red") legend("topright",lwd=c(2,2), col=c("black","red"), pch=c(19,19), legend=c("p=1.7","p=1.2") ) ### Plot distribution functions tweedie.plot( power=1.05, mu=1, phi=1, y=yy, lwd=2, type="cdf", ylim=c(0,1)) tweedie.plot( power=2, mu=1, phi=1, y=yy, add=TRUE, lwd=2, type="cdf",col="red") legend("bottomright",lwd=c(2,2), col=c("black","red"), legend=c("p=1.05","p=2") ) ### Now, plot two densities, combining p>2 and 1<p<2 tweedie.plot( power=3.5, mu=1, phi=1, y=yy, lwd=2) tweedie.plot( power=1.5, mu=1, phi=1, y=yy, lwd=2, col="red", add=TRUE) legend("topright",lwd=c(2,2), col=c("black","red"), pch=c(NA,19), legend=c("p=3.5","p=1.5") )
### Plot a Tweedie density with 1<p<2 yy <- seq(0,5,length=100) tweedie.plot( power=1.7, mu=1, phi=1, y=yy, lwd=2) tweedie.plot( power=1.2, mu=1, phi=1, y=yy, add=TRUE, lwd=2, col="red") legend("topright",lwd=c(2,2), col=c("black","red"), pch=c(19,19), legend=c("p=1.7","p=1.2") ) ### Plot distribution functions tweedie.plot( power=1.05, mu=1, phi=1, y=yy, lwd=2, type="cdf", ylim=c(0,1)) tweedie.plot( power=2, mu=1, phi=1, y=yy, add=TRUE, lwd=2, type="cdf",col="red") legend("bottomright",lwd=c(2,2), col=c("black","red"), legend=c("p=1.05","p=2") ) ### Now, plot two densities, combining p>2 and 1<p<2 tweedie.plot( power=3.5, mu=1, phi=1, y=yy, lwd=2) tweedie.plot( power=1.5, mu=1, phi=1, y=yy, lwd=2, col="red", add=TRUE) legend("topright",lwd=c(2,2), col=c("black","red"), pch=c(NA,19), legend=c("p=3.5","p=1.5") )
Maximum likelihood estimation of the Tweedie index parameter .
tweedie.profile(formula, p.vec=NULL, xi.vec=NULL, link.power=0, data, weights, offset, fit.glm=FALSE, do.smooth=TRUE, do.plot=FALSE, do.ci=do.smooth, eps=1/6, control=list( epsilon=1e-09, maxit=glm.control()$maxit, trace=glm.control()$trace ), do.points=do.plot, method="inversion", conf.level=0.95, phi.method=ifelse(method == "saddlepoint", "saddlepoint", "mle"), verbose=FALSE, add0=FALSE)
tweedie.profile(formula, p.vec=NULL, xi.vec=NULL, link.power=0, data, weights, offset, fit.glm=FALSE, do.smooth=TRUE, do.plot=FALSE, do.ci=do.smooth, eps=1/6, control=list( epsilon=1e-09, maxit=glm.control()$maxit, trace=glm.control()$trace ), do.points=do.plot, method="inversion", conf.level=0.95, phi.method=ifelse(method == "saddlepoint", "saddlepoint", "mle"), verbose=FALSE, add0=FALSE)
formula |
a formula expression as for other regression models and generalized linear models,
of the form |
p.vec |
a vector of |
xi.vec |
the same as |
link.power |
the power link function to use.
These link functions |
data |
an optional data frame, list or environment
(or object coercible by |
weights |
an optional vector of weights to be used in the fitting
process. Should be |
offset |
this can be used to specify an a priori
known component to be included in the linear predictor during fitting.
This should be |
fit.glm |
logical flag.
If |
do.smooth |
logical flag.
If |
do.plot |
logical flag.
If |
do.ci |
logical flag.
If |
eps |
the offset in computing the variance function.
The default is |
control |
a list of parameters for controlling the fitting process;
see |
do.points |
plot the points on the plot where the
(log-) likelihood is computed for the given values of |
method |
the method for computing the (log-) likelihood.
One of
|
conf.level |
the confidence level for the computation of the nominal
confidence interval.
The default is |
phi.method |
the method for estimating |
verbose |
the amount of feedback requested:
|
add0 |
if |
For each value in p.vec
,
the function computes an estimate of phi
and then computes the value of the log-likelihood for these parameters.
The plot of the log-likelihood against p.vec
allows the maximum likelihood value of p
to be found.
Once the value of p
is found,
the distribution within the class of Tweedie distribution is identified.
The main purpose of the function is to estimate the value
of the Tweedie index parameter, ,
which is produced by the output list as
p.max
.
Optionally (if do.plot=TRUE
),
a plot is produced that shows the profile log-likelihood
computed at each value in p.vec
(smoothed if do.smooth=TRUE
).
This function can be temperamental
(for theoretical reasons involved in numerically computing the density),
and this plot shows the values of requested on the
horizontal axis (using
rug
);
there may be fewer points on the plot,
since the likelihood some values of requested
may have returned
NaN
, Inf
or NA
.
A list containing the components:
y
and x
(such that plot(x,y)
(partially)
recreates the profile likelihood plot);
ht
(the height of the nominal confidence interval);
L
(the estimate of the (log-) likelihood at each given value of p
);
p
(the p
-values used);
phi
(the computed values of phi
at the values in p
);
p.max
(the estimate of the mle of p
);
L.max
(the estimate of the (log-) likelihood at p.max
);
phi.max
(the estimate of phi
at p.max
);
ci
(the lower and upper limits of the confidence interval for p
);
method
(the method used for estimation: series
, inversion
,
interpolation
or saddlepoint
);
phi.method
(the method used for estimation of phi
:
saddlepoint
or phi
).
If glm.fit
is TRUE
,
the list also contains a component glm.obj
,
a glm
object for the fitted Tweedie generalized linear model.
The estimates of p
and phi
are printed.
The result is printed invisibly.
If the response variable has any exact zeros,
the values in p.vec
must all be between one and two.
The function is sometimes unstable and may fail.
It may also be very slow.
One solution is to change the method.
The default is method="inversion"
(the default);
then try method="series"
,
method="interpolation"
and
method="saddlepoint"
in that order.
Note that
method="saddlepoint"
is an approximate method only.
Also make sure the values in p.vec
are suitable for the data
(see above paragraph).
It is recommended that for the first use with a data set,
use p.vec
with only a small number of values
and set
do.smooth=FALSE
,
do.ci=FALSE
.
If this is successful,
a larger vector p.vec
and smoothing can be used.
Peter Dunn ([email protected])
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi:10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi:10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika 74(2), 221–232. doi:10.1093/biomet/74.2.221
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
dtweedie
,
dtweedie.saddle
,
tweedie
library(statmod) # Needed to use tweedie.profile # Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) # The gamma is a Tweedie distribution with power=2; # let's see if p=2 is suggested by tweedie.profile: ## Not run: out <- tweedie.profile( test.data ~ 1, p.vec=seq(1.5, 2.5, by=0.2) ) out$p.max out$ci ## End(Not run)
library(statmod) # Needed to use tweedie.profile # Generate some fictitious data test.data <- rgamma(n=200, scale=1, shape=1) # The gamma is a Tweedie distribution with power=2; # let's see if p=2 is suggested by tweedie.profile: ## Not run: out <- tweedie.profile( test.data ~ 1, p.vec=seq(1.5, 2.5, by=0.2) ) out$p.max out$ci ## End(Not run)