Custom Writing Help For You!

Special Discounts Offers! 20-30% Off!

Posted: March 8th, 2020

Application of Regression Analysis

Write my paper – Get Assignment Essay Pro Writers For You Research Papers:



In the application of regression analysis, often the data set consist of unusual observations which are either outliers (noise) or influential observations. These observations may have large residuals and affect the parameters of the regression co-efficient and the whole regression analysis and become the source of misleading results and interpretations. Therefore it is very important to consider these suspected observations very carefully and made a decision that either these observations should be included or removed from the analysis.

In regression analysis, the basic step is to determine whether one or more observations can influence the results and interpretations of the analysis. If the regression analysis have one independent variable, then it is easy to detect observations in dependent and independent variables by using scatter plot, box plot and residual plot etc. But graphical method to identify outlier and/or influential observation is a subjective approach. It is also well known that in the presence of multiple outliers there can be a masking or swamping effect. Masking (false negative) occurs when an outlying subset remains undetected due the presence of another, usually adjacent subset. Swamping (false positive) occurs when usual observation is incorrectly identified as outlier in the presence of another usually remote subset of observations.

In the present study, some well known diagnostics are compared to identify multiple influential observations. For this purpose, first, robust regression methods are used to identify influential observation in Poisson regression, then to conform that the observations identified by robust regression method are genuine influential observations, some diagnostic measures based on single case deletion approach like Pearson chi-square, deviance residual, hat matrix, likelihood residual test, cook’s distance, difference of fits, squared difference in beta are considered but in the presence of masking and swamping diagnostics based on single case deletion fail to identify outlier and influential observations. Therefore to remove or minimize the masking and swamping phenomena some group deletion approaches; generalized standardized Pearson residual, generalized difference of fits, generalized squared difference in beta are taken.   

3.2 Diagnostic measures based on single case deletion

This section presents the detail of single case deleted measures which are used to identify multiple influential observations in Poisson regression model. These measures are change in Pearson chi-square, change in deviance, hat matrix, likelihood residual test, cook’s distance, difference of fits (DFFITS),squared difference in beta(SDBETA).

  1. Pearson chi-square

To show the amount of change in Poisson regression estimates that would occurred if the kth observation is deleted, Pearson χ2 statistic is proposed to detect the outlier. Such diagnostic statistics are one that examine the effected of deleting single case on the overall summary measures of fit.

Let denotes the Pearson χ2 and denotes the statistic after the case k is deleted. Using one-step linear approximations given by Pregibon (1981). The decrease in the value of statistics due to deletion of the kth case is

Δ = Ë- , k=1,2,3,…..,n 3.1

is defined as:



And for the kth deleted case is:

= 3.3

  1. Deviance residual

The one-step linear approximation for change in deviance when the kth case is deleted is:

ΔD = D Ë- D(-k) 3.4


Because the deviance is used to measure the goodness of fit of a model, a substantial decrease in the deviance after the deletion of the kth observation is indicate that is observation is a misfit. The deviance of Poisson regression with kth observation is:

D=2 3.5

Where = exp (

D(-k)= 2 3.6

A larger value of ΔD(-k) indicates that the kth value is an outlier.

  1. Hat matrix:

The Hat matrix is used in residual diagnostics to measure the influence of each observation. The hat values, hii, are the diagonal entries of the Hat matrix which is calculated using

H=V1/2X(XTVX)-1XTV1/2 3.7

Where V=diag[var(yi)(ii)]-1


In Poisson regression model

=i) = (,where g function is usually called the link function and With the log link in Poisson regression



V=diag( 3.8

(XTVX)-1 is an estimated covariance matrix of and hii is the ith diagonal element of Hat matrix H. The properties of the diagonal element of hat matrix i.e leverage values are



Where k indicates the parameter of the regression model with intercept term. An observation is said to be influential if ckn. where c is a suitably constant 2 and 3 or more. Using twice the mean thumb rule suggested by Hoaglin and Welsch (1978), an observation with 2kn considered as influential.

  1. Likelihood residual test

For the detection of outliers, Williams (1987) introduced the likelihood residual. The squared likelihood residual is a weighted average of the squared standardized deviance and Pearson residual is defined as:


and it is approximately equals to likelihood ratio test for testing whether an observation is an outlier and it also called approximate studentized residual, is standardized Pearson residual is defined as:

= 3.10

is standardized deviance residual is defined as:

= 3.11

= sign(

Where is called the deviance residual and it is another popular residual because the sum of square of these residual is a deviance statistic.

Because the average value, KN, of hi is small is much closer to than to ,and therefore also approximately normally distributed. An observation is considered to be influential if |t(1, n

  1. Difference of fits test (DFFITS)

Difference of fits test for Poisson regression is defined as:

(DFFITS)i= , i=1,2,3,…..,n 3.12

Where and are respectively the ith fitted response and an estimated standard error with the ith observation is deleted. DFFITS can be expressed in terms of standardized Pearson residuals and leverage values as:

(DFFITS)i= 3.13

= =

An observation is said to be influential if the value of DFFITS 2.

  1. Cook’s Distance:

Cook (1977) suggests the statistics which measures the change in parameter estimates caused by deleting each observation, and defined as:

CDi= 3.14

Where is estimated parameter of without ith observation. There is also a relationship between difference of fits test and Cook’s distance which can be expressed as:

CDi= 3.15

Using approximation suggested by Pregibon’s C.D can be expressed as:

() 3.16

Observation with CD value greater than 1 is treated as an influential.

  1. Squared Difference in Beta (SDFBETA)

The measure is originated from the idea of Cook’s distance (1977) based on single case deletion diagnostic and brings a modification in DFBETA (Belsley et al., 1980), and it is defined as

(SDFBETA)i = 3.17

After some necessary calculation SDFBETA can be relate with DFFITS as:

(SDFBETA)i = 3.18

The ith observation is influential if (SDFBETA)i

  1. Diagnostic measures based on group deletion approach

This section includes the detail of group deleted measures which are used to identify the multiple influential observations in Poisson regression model. Multiple influential observations can misfit the data and can create the masking or swamping effect. Diagnostics based on group deletion are effective for identification of multiple influential observations and are free from masking and swamping effect in the data. These measures are generalized standardized Pearson residual (GSPR), generalized difference of fits (GDFFITS) and generalized squared difference in Beta(GSDFBETA).

3.3.1 Generalized standardized Pearson residual (GSPR)

Imon and Hadi (2008) introduced GSPR to identify multiple outliers and it is defined as:

i 3.19

= i 3.20

Where are respectively the diagonal elements of V and H (hat matrix) of remaining group. Observations corresponding to the cases |GSPR| > 3 are considered as outliers.

3.3.2 Generalized difference of fits (GDFFITS)

GDFFITS statistic can be expressed in terms of GSPR (Generalized standardized Pearson residual) and GWs (generalized weights).

GWs is denoted by and defined as:

for i 3.21

= for i 3.22

A value having is larger than, Median (MAD ( is considered to be influential i.e

> Median (MAD (

Finally GDFFITS is defined as

(GDFFITS)i= 3.23

We consider the observation as influential if


3.3.3 Generalized squared difference in Beta (GSDFBETA)

In order to identify the multiple outliers in dataset and to overcome the masking and swamping effect GSDFBETA is defined as:

GSDFBETAi = for i 3.24

= for i 3.25

Now the generalized GSDFBETA can be re-expressed in terms of GSPR and GWs:

GSDFBETAi = for i 3.26

= for i 3.27

A suggested cut-off value for the detection of influential observation is


Order for this Paper or similar Answer/Assignment Writing Service

Place your order by filling a guided instructions form in 3 easy steps.

Why choose our Study Bay Services?

Like every student, Focusing on achieving the best grades is our main goal

Top Essay Writers

We have carefully cultivated a team of exceptional academic writers, each with specialized expertise in particular subject areas and a proven track record of research writing excellence. Our writers undergo rigorous screening and evaluation to ensure they hold relevant advanced degrees and demonstrate mastery of English grammar, citation style, and research methodology. Recent projects completed by our writers include research papers on topics such as sustainable energy policy, cognitive behavioral therapy, and molecular genetics.

Student-Based Prices

We prioritize attracting highly skilled writers through competitive pay and strive to offer the most cost-effective services for students. References from recent years include surveys of customer satisfaction with online writing services conducted by the American Customer Satisfaction Index between 2018 to 2022, demonstrating our commitment to balancing affordable costs with high standards of work through positive reviews and retention of expert writers.

100% Plagiarism-Free

We guarantee 100% original and plagiarism-free final work through a thorough scanning of every draft copy using advanced plagiarism detection software before release, ensuring authentic and high-quality content for our valued customers. To note, we also do not generate assignment content with AI tool, thus you a guaranteed 0% similarity index for your final research paper.

How it works

When you decide to place an order with Study Pro Essay, here is what happens:

Complete the Order Form

You will complete our order form, filling in all of the fields and giving us as much detail as possible.

Assignment of Writer

We analyze your order and match it with a writer who has the unique qualifications to complete it, and he begins from scratch.

Order in Production and Delivered

You and,the support and your writer communicate directly during the process, and, once you receive the final draft, you either approve it or ask for revisions.

Giving us Feedback (and other options)

We want to know how your experience went. You can read other clients’ testimonials too. And among many options, you can choose a favorite writer.