Minor Discrepancy in Results between SAS Proc Genmod and R GLM

In order to explain how the models function similarly, it would be more beneficial to focus on the programming aspect of SO and to read up on parameterizing linear models. This will provide the necessary details on why SAS and R yield different results. Additionally, we are nearing the parameter convergence tolerance default value, which is $10^{-4}$. As for your question, it pertains to converting SAS PROC.


Question:

In my attempt to replicate a
model fit
via R’s glm using
SAS proc
genmod, I have successfully obtained identical estimates and SEs for all variables, with the exception of the intercept and the Distance coefficient.

SAS:

    *Binomial -- pearson scale;
    proc genmod data=range.daily3;
     class ID;
      model Nr/Ne = Distance ID Distance*ID Noise_Quotient Temp_C Turbid__NTU   Sal_ppt PREC NWind EWind Time_Deployed / dist=binomial link=logit scale=p;
    run;
Analysis Of Maximum Likelihood Parameter Estimates 
Parameter   DF Estimate StandardError Wald 95% Confidence Limits Wald ChiSquare Pr > ChiSq 
Intercept   1  4.6028   0.1511        4.3067              4.8989 928.45         <.0001 
Distance    1  -0.0043  0.0001        -0.0045            -0.0040 1108.13        <.0001 
ID 1757     1  2.5452   0.1818        2.1889             2.9016  196.03         <.0001 
ID 2459     0  0.0000   0.0000        0.0000             0.0000  .              . 
Distance*ID 1  -0.0006  0.0002        -0.0010            -0.0001 5.42           0.0200 
1757  
Distance*ID 0  0.0000   0.0000        0.0000             0.0000  .              . 
2459
Noise_      1  -0.0003  0.0000        -0.0003            -0.0002 45.66          <.0001 
Quotient
Temp_C      1  -0.0425  0.0041        -0.0506            -0.0343 104.89         <.0001 
Turbid__NTU 1  -0.0209  0.0024        -0.0257            -0.0161 73.61          <.0001 
Sal_ppt     1  -0.0331  0.0188        -0.0699            0.0037  3.10           0.0783 
PREC        1  0.0058   0.0020        0.0018             0.0098  8.03           0.0046 
NWind       1  -0.0495  0.0061        -0.0613            -0.0376 66.75          <.0001 
EWind       1  0.0609   0.0084        0.0444             0.0774  52.61          <.0001 
Time_       1  -0.0041  0.0003        -0.0048            -0.0035 152.59         <.0001 
Deployed 
Scale       0  5.4044   0.0000        5.4044             5.4044     

In R, the ID variable was transformed into a factor after the data was imported in .csv format.

longtermrange2 <- read.csv("C:/Users/Data/Ashley's Google Drive/Telemetry Data/Dissertation/Chapter1/Long-Term Range/longtermrange2.csv", sep=',',header=T, na.strings=NA)
longtermrange2$ID <- as.factor(longtermrange2$ID)

R:

quasi <- glm(cbind(Nr, Ne-Nr) ~ Noise_Quotient + Temp_C + Distance*ID + Turbid__NTU + Sal_ppt + PREC + NWind + EWind + Time_Deployed,
         ,family=quasibinomial(link=logit), data=longtermrange2)
summary(quasi, dispersion=sum(residuals(quasi,"pearson")^2)/quasi$df.residual)
    Call:
glm(formula = cbind(Nr, Ne - Nr) ~ Noise_Quotient + Temp_C + 
    Distance * ID + Turbid__NTU + Sal_ppt + PREC + NWind + EWind + 
    Time_Deployed, family = quasibinomial(link = logit), data = longtermrange2)
Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-20.5450   -3.4402    0.9021    3.7065   20.8752  
Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)      7.1480566  0.2076962  34.416  < 2e-16 ***
Noise_Quotient  -0.0002608  0.0000386  -6.757 1.40e-11 ***
Temp_C          -0.0424589  0.0041457 -10.242  < 2e-16 ***
Distance        -0.0048587  0.0002086 -23.295  < 2e-16 ***
ID2459          -2.5452458  0.1817872 -14.001  < 2e-16 ***
Turbid__NTU     -0.0209159  0.0024378  -8.580  < 2e-16 ***
Sal_ppt         -0.0330704  0.0187836  -1.761  0.07831 .  
PREC             0.0058018  0.0020478   2.833  0.00461 ** 
NWind           -0.0494535  0.0060530  -8.170 3.08e-16 ***
EWind            0.0609026  0.0083965   7.253 4.07e-13 ***
Time_Deployed   -0.0041183  0.0003334 -12.353  < 2e-16 ***
Distance:ID2459  0.0005680  0.0002440   2.327  0.01995 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 29.20711)
    Null deviance: 231850  on 2699  degrees of freedom
Residual deviance:  74621  on 2688  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5

The sole coefficients that vary are the Distance and Intercept estimates, accompanied by their respective
standard error
values.

Do you have any thoughts on the possible cause of this?


Solution:

The ID factor’s reference levels have been selected differently by R and SAS.


SAS

Parameter   DF Estimate 
Intercept   1  4.6028 
ID 1757     1  2.5452   
ID 2459     0  0.0000 


R

                 Estimate  
(Intercept)      7.1480566 
ID2459          -2.5452458
ID1757           0.0

Please be aware that the reference level is not displayed by R, it has only been deduced by me.

Take note that within the linear predictor for class 1757, the impact of the coefficients can be seen in the
sas model
area.

4.6028 + 2.5452 = 7.148

The R model’s intercept is balanced with its coefficients to produce consistent predictions.

Your coefficients for the interaction of

Distance*ID

are undergoing a comparable process.


SAS

Parameter       DF Estimate 
Distance        1  -0.0043
Distance*ID1757 1  -0.0006  
Distance*ID2459 0  0.0000 


R

                 Estimate  
Distance        -0.0048587  0.0002086
Distance:ID2459  0.0005680  0.0002440
Distance:ID1757  0.0

This time, observe that

-0.0043 + -0.0006 = -0.0049

What is the coefficient of

Distance

in the R model?

Frequently Asked Questions

Posted in Uncategorized