Parametric Bivariate Regression Analysis Based On Censored Samples ...
c
Heldermann Verlag
Economic Quality Control
ISSN 0940-5151
Vol 19 (2004), No. 1, 83 – 90
Parametric Bivariate Regression Analysis
Based on Censored Samples: A Weibull Model
David D. Hanagal
Abstract: In this paper, we propose a new bivariate Weibull distribution and derive estimation
and test procedures based on censored samples with common covariates. The study of bivari-
ate Weibull was motivated by interesting biometrical applications. The maximum likelihood
estimators for the model parameters are obtained together with a test of significance.
Key words: Bivariate Weibull model, Parametric regression, Survival times.
1
Introduction
Hanagal [3] proposed a multivariate Weibull distribution which includes the multivariate
exponential of Marshall-Olkin [4]. Thus, our bivariate Weibull (BVW) model represents
a super model of the BVE of Marshall-Olkin [4]. Some biometrical applications have
motivated to study a BVW regression model in connection with paired organs like kidneys,
eyes, ears or others. These pairs of an individual (e.g. patient) is considered as a two
component system. The components are assumed to be dependent, as it is quite common
that for instance simultaneous failure of organs may occur.
The life time of an individual is assumed to be independent of the life times of the paired
organs. Because death of an individual will censor both life times of organs, it is possible
to use univariate censoring as given by Hanagal [1, 2].
In the above mentioned biometrical applications, the covariates were age, sex, smoking
or alcoholic habits of the patient, diabetic or non-diabetic conditions or some specific
diseases of the patient etc. In such situations, it is not realistic to treat the life times of
the paired organs as identically distributed BVW, as it depends heavily of the common
covariates. These covariates are the characteristics or properties of a patient and are not
different for each organ of an individual.
In Section 2, the BVW distribution is introduced; Section 3 is devoted to the maximum
likelihood estimator of the parameters, and in Section 4, a significance test for the regres-
sion parameters is briefly outlined.
84
David D. Hanagal
2
Bivariate Weibull Distribution (BVW)
Hanagal [3] proposed a (k + 2) parameter family of k-variate Weibull distributions. The
BVW is obtained by specifying k = 2; it has the following probability density function:
λ
1(λ2 + λ3)σ2(t1t2)σ−1e−λ1tσ1−(λ2+λ3)tσ2 for 0 < t1 < t2 < ∞
f(t1, t2) =
λ
2(λ1 + λ3)σ2(t1t2)σ−1e−λ2tσ2−(λ1+λ3)tσ1 for 0 < t2 < t1 < ∞
(1)
λ3σtσ−1e−(λ1+λ2+λ3)tσ
for 0 < t1 = t2 = t < ∞
where λ1, λ2, λ3 > 0.
The BVW distribution (1) has remarkable properties. Both marginal distributions of
BVW are univariate Weibull. More specifically speaking, we can state that
Let (T1, T2) ∼ BVW(λ1, λ2, λ3; σ) then the following implications hold:
•
T1 ∼ Weibull((λ1 + λ3), σ)
(2)
•
T2 ∼ Weibull((λ2 + λ3), σ)
(3)
•
T3 = min(T1, T2) ∼ Weibull((λ1 + λ2 + λ3), σ)
(4)
where (λ1 + λ3), (λ2 + λ3) and (λ1 + λ2 + λ3) are the scale parameters of T1, T2 and
T3, respectively and σ is the common shape parameter. The parameter λ3 quantifies
the dependence between the two variables (T1, T2) and λ3 = 0 implies T1 and T2 are
independent.
The BVW distribution (1) is not absolutely continuous with respect to Lebesque measure
in R2. It has a singularity on the diagonal t1 = t2, which represents simultaneous failure
of T1 and T2
The probability of simultaneous failures is easily obtained:
λ
P
3
(T1 = T2) = λ
(5)
1 + λ2 + λ3
which also gives the correlation between T1 and T2.
The diagonal t1 = t2 separates the plane in two regions. The probability for these are the
following:
λ
P
1
(T1 < T2) = λ
(6)
1 + λ2 + λ3
λ
P
2
(T1 > T2) = λ
(7)
1 + λ2 + λ3
Parametric Bivariate Regression Analysis Based on Censored Samples: A Weibull Model
85
From (1) the BVE of Marshall-Olkin [4] is obtainde for σ = 1. Taking the logarithm
of BVE variables i.e., Y1=logT1 and Y2=logT2 leads to a bivariate extreme value(BVEV)
distribution with the following probability density function:
ey1+logλ1+y2+log(λ2+λ3)−ey1+logλ1−ey2+log(λ2+λ3) for −∞<y
1 < y2 < ∞
f(y1, y2)=
ey2+log λ2+y1+log(λ1+λ3)−ey2+logλ2−ey1+log(λ1+λ3)
for −∞ <y2 <y1 < ∞
(8)
ey+logλ3−ey+log(λ1+λ2+λ3)
for −∞ <y1 = y2 = y < ∞
From (1) and (8) it can be seen that (λ1, λ2, λ3) are not proper scale parameters for BVE
of Marshall-Olkin(1967). In the univariate case, the location parameter of univariate
extreme value distribution is the scale parameter of the exponential distribution. Thus,
BVE of Marshall-Olkin(1967) and corresponding BVEV do not belong to the location-
scale family.
Assuming that the variables of interest (Y1, Y2) depend on some covariates given by
(X1, X2) leads to regarding the following regression model for the two component sys-
tem,
Y1
b1X1
1
U1
Y
=
+
(9)
2
b
U
2X2
σ
2
where X1 and X2 are m-dimensional vectors of covariates, b1 and b2 are m-dimensional
vectors of regression coefficients and (U1, U2) follows a BVEV distribution as given by (8).
Knowing that the BVEV distribution does not belong to the location-scale family, there
is no intercept term included in the regression model.
Taking Y1 = logT1 and Y2 = logT2, we obtain
T1
eb1X1V 1/σ
1
T
=
(10)
2
eb2X2V 1/σ
2
where (V1 = eU1, V2 = eU2) is BVE of Marshall-Olkin(1967). Alternately we may write
V1
e−σb1X1T σ1
V
=
(11)
2
e−σb2X2T σ2
Making the additional assumption that both components have common covariates and
regression parameters, i.e., X1 = X2 and b1 = b2, then we have
V1
T σ1
e−σb X
V
=
(12)
2
T σ2
86
David D. Hanagal
3
ML-Estimation of the Parameters
The univariate random censoring scheme given by Hanagal [1, 2] is used for estimating
the bivariate life time distribution, which takes into account that individuals do not enter
at the same time the study and a withdrawal of an individual will censor both life times
of the components which in the sequel will be called implants, because the model was
developed and applied in the framework of teeth implants for upper and lower jaws.
Suppose that there are n independent pairs of implants under study, where the ith pair
of implants have life times ( ˜
T1i, ˜
T2i) and a censoring time (Zi).
Let the censored random life of the ith pair be denoted by (T1i, T2i). Then (T1i, T2i) is
defined as follows:
( ˜
T
1i, ˜
T2i) if max( ˜
T1i, ˜
T2i) < Zi
( ˜
T1i, Zi) if ˜
T1i < Zi < ˜
T2i
(T1i, T2i) =
(13)
(Zi, ˜
T2i) if ˜
T2i < Zi < ˜
T1i
(Zi, Zi)
if Zi < min( ˜
T1i, ˜
T2i)
There are six different types of events which might occur with respect to (T1i, T2i),
i = 1, . . . , n. These are the following:
1.
Type 1: ˜
T1i < ˜
T2i < Zi
2.
Type 2: ˜
T2i < ˜
T1i < Zi
3.
Type 3: ˜
T1i = ˜
T2i < Zi
4.
Type 4: ˜
T1i < Zi < ˜
T2i
5.
Type 5: ˜
T2i < ZI < ˜
T1i
6.
Type 6: Zi < min( ˜
T1i, ˜
T2i)
Below, the probability density functions for Type 1 to 5 and the survival function for
Type 6 of (T1i, T2i) are stated.
f1(t1, t2) = σ2λ1(λ2 + λ3)(t1t2)σ−1e−2σX b−[λ1tσ1+(λ2+λ3)tσ2]e−σX b
(14)
f2(t1, t2) = σ2λ2(λ1 + λ3)(t1t2)σ−1e−2σX b−[λ2tσ2+(λ1+λ3)tσ1]e−σX b
(15)
f3(t, t)
= σλ3tσ−1e−σX b−(λ1+λ2+λ3)tσe−σX b
(16)
P
f
[t1 < ˜
T1 < t1 + δt | ˜
T2 > t2]P [ ˜
T2 > t2]
4(t1, t2)
=
lim
δt→ 0
δt
(17)
= σλ1tσ−1
1
e−σX b−[λ1tσ1+(λ2+λ3)tσ2]e−σX b
(18)
Parametric Bivariate Regression Analysis Based on Censored Samples: A Weibull Model
87
P
f
[t2 < ˜
T2 < t2 + δt | ˜
T1 > t1]P [ ˜
T1 > t1]
5(t1, t2)
=
lim
(19)
δt→ 0
δti
= σλ2tσ−1
2
e−σX b−[λ2tσ2+(λ1+λ3)tσ1]e−σX b
(20)
F (t, t)
= P [T1 > t, T2 > t] = e−(λ1+λ2+λ3)tσe−σX b
(21)
where
X = ( ˜
T1, ..., ˜
Tm)
(22)
b = (b1, ..., bm)
(23)
Let n1, n2, n3, n4, n5 and n6 be the numbers of observations representing the different
types of events with n = n1 + . . . + n6. Then the likelihood function L for a sample
(t11, t21), . . . , (t1n, t2n) is given as follows:
n1
n2
n3
n4
n5
n6
L =
f1
f2
f3
f4
f5
F
(24)
i=1
i=1
i=1
i=1
i=1
i=1
Discarding factors which do not contain any of the parameters and taking the logarithm
yields:
log L = (2n1+2n2+ n3+n4+n5) log σ+(n1+n4) log λ1+(n2+n5) log λ2+n3 log λ3 +
n1 log(λ2+λ3) + n2 log(λ1+λ3) + (σ−1)
log t1i + (σ−1)
log t2i −
i∈A
i∈B
n
σ
Xib − σ
Xib −
(λ1tσ1i + λ2tσ2i + λ3tσ(2)i)e−σXib
(25)
i∈C
i∈D
i=1
where
A = {i | t1i < zi}
(26)
B = {i | t2i < zi}
(27)
C = {i | max(t1i, t2i) < zi, t1i = t2i}
(28)
D = {i | t1i < zi or t2i < zi}
(29)
= C ∪ {i | t1i = t2i < zi} ∪ {i|t(1)i < zi < t(2)i}
(30)
with
t(1)i = min(t1i, t2i)
(31)
t(2)i = max(t1i, t2i)
(32)
88
David D. Hanagal
The likelihood equations are given by
∂
n
log L
(n1 + n4)
n2
−
tσ
∂λ
=
+
1ie−σXib = 0
(33)
1
λ1
λ1 + λ3
i=1
∂
n
log L
(n2 + n5)
n1
−
tσ
∂λ
=
+
2ie−σXib = 0
(34)
2
λ2
λ2 + λ3
i=1
∂
n
log L
n3
n1
n2
−
tσ
∂λ
=
+
+
(2)ie−σXib = 0
(35)
3
λ3
λ2 + λ3
λ1 + λ3
i=1
∂ log L
(2n1 + 2n2 + n3 + n4 + n5)
∂σ
=
σ
+
log t1i +
log t2i −
Xib −
Xib −
i∈A
i∈B
i∈C
i∈D
n
n
λ1
tσ1ie−σXib[log t1i − Xib] − λ2
tσ2ie−σXib[log t2i − Xib] −
i=1
i=1
n
λ3
tσ(2)ie−σXib[log t(2)i − Xib] = 0
(36)
i=1
∂
n
log L
∂b
= −σ
Xji − σ
Xji + σ
(λ1tσ1i + λ2tσ2i + λ3tσ(2)i)Xjie−σXib = 0 (37)
j
i∈C
i∈D
i=1
for j = 1, ..., m.
The likelihood equations may be solved by a Newton-Raphson procedure, where the
second order partial derivatives of the log-likelihood function are given by:
∂2 log L
n2
= −(n1 + n4) −
(38)
∂λ21
λ21
(λ1 + λ3)2
∂2 log L
n1
= −(n2 + n5) −
(39)
∂λ22
λ22
(λ2 + λ3)2
∂2 log L
n3
n1
n2
= −
−
−
(40)
∂λ23
λ23
(λ2 + λ3)2
(λ1 + λ3)2
∂2 log L
∂λ
= 0
(41)
1∂λ2
∂2 log L
n2
∂λ
= −
(42)
1∂λ3
(λ1 + λ3)2
∂2 log L
n1
∂λ
= −
(43)
2∂λ3
(λ2 + λ3)2
∂2
n
log L
− λ
tσ
∂σ
= −(2n1 + 2n2 + n3 + n4 + n5)
2
σ2
1
1ie−σXib(log t1i − Xib)2 −
i=1
n
n
λ2
tσ2ie−σXib(log t2i − Xib)2 − λ3
tσ(2)ie−σXib(log t(2)i − Xib)2 (44)
i=1
i=1
Parametric Bivariate Regression Analysis Based on Censored Samples: A Weibull Model
89
∂2
n
log L
tσ
∂σ∂λ
= −
1ie−σXib(log t1i − Xib)
(45)
1
i=1
∂2
n
log L
tσ
∂σ∂λ
= −
2ie−σXib(log t2i − Xib)
(46)
2
i=1
∂2
n
log L
tσ
∂σ∂λ
= −
(2)ie−σXib(log t(2)i − Xib)
(47)
3
i=1
∂2
n
log L
∂b
= −σ2
(λ1tσ1i + λ2tσ2i + λ3tσ(2)i)XjiXkie−σXib
j, k = 1, ..., m
(48)
j ∂bk
i=1
∂2
n
log L
tσ
∂b
= σ
1iXjie−σXib
j = 1, ..., m
(49)
j ∂λ1
i=1
∂2
n
log L
tσ
∂b
= σ
2iXjie−σXib
j = 1, ..., m
(50)
j ∂λ2
i=1
∂2logL
n
tσ
∂b
= σ
(2)iXjie−σXib
j = 1, ..., m
(51)
j ∂λ3
i=1
∂2
n
log L
X
X
∂b
ji −
ji +
(λ1tσ1i + λ2tσ2i + λ3tσ(2)i)Xjie−σXib +
j ∂σ
= −
i∈C
i∈D
i=1
n
σ
Xjie−σXib(λ1tσ1i(log t1i − Xib) + λ2T σ2i(log t2i − Xib) +
i=1
λ3tσ(2)i(log t(2)i − Xib))
j = 1, ..., m
(52)
The observed Fisher information matrix, I is a (m + 4) × (m + 4) matrix, where the entries
are second order partial derivatives displayed above.
∂2logL
∂2logL
∂2logL
∂λi∂λj ∂λi∂σ ∂λi∂bl
∂2logL
∂2logL
∂2logL
I = −
∂λj∂σ
∂σ2
∂σ∂bl .
∂2logL
∂2logL
∂2logL
∂λj∂bk
∂σ∂bk
∂bl∂bk
The inverse of the observed Fisher information matrix is the observed variance-covariance
matrix
Σ = I−1
(53)
of the ML-estimator ˆ
θ
=
(ˆ
λ1, ˆλ2, ˆλ3, ˆσ, ˆb1, ..., ˆbm) of the regression parameter
θ = (λ1, λ2, λ3, σ, b1, ..., bm) .
√
The quantity
n(ˆθ − θ) has an asymptotic multivariate normal distribution with mean
vector zero and observed variance-covariance matrix Σ.
90
David D. Hanagal
4
Significance Test for the Regression Coefficients and Fitting
the Model
The hypotheses about b can be frequently put in the form Ho : b1 = 0, with b partitioned
as b = (b1, b2) where b1 is a k-dimensional vector with (k < m). To test H0 against
the alternative that b1 = 0 one can use
Λ
ˆ
1
= ˆ
b1Σ−1
11 b1
(54)
where Σ11 is k × k asymptotic observed variance-covariance matrix of ˆ
b1. Under H0, Λ1
is asymptotically chi-square with k degrees of freedom.
For fitting the regression model, we use forward selection method. The procedure is
as follows. We take one covariate at a time in the regression model and testing for
the corresponding regression coefficient. Go on adding the covariates in the regression
model until all regression coefficients in the model are significant at least at 5% level of
significance.
References
[1]
Hanagal, D.D.(1992): Some inference results in bivariate exponential distributions
based on censored samples. Comm. Statistics, Theory and Methods 21, 1273-95.
[2]
Hanagal, D.D. (1992): Some inference results in modified Freund’s bivariate expo-
nential distribution. Biometrical Journal 34, 745-56.
[3]
Hanagal, D.D.(1996): A multivariate Weibull distribution. Economic Quality Con-
trol, 11, 193-200.
[4]
Marshall, A.W. and Olkin, I.(1967): A multivariate exponential distribution. Jour-
nal of Amer. Statist. Assoc. 62, 30-44.
David D. Hanagal
Department of Statistics
University of Pune
Pune-411007
India