proc phreg estimate statement example

As in Example 1, you can also use the LSMEANS, LSMESTIMATE, and SLICE statements in PROC LOGISTIC, PROC GENMOD, and PROC GLIMMIX when dummy coding (PARAM=GLM) is used. The Nelson-Aalen estimator is a non-parametric estimator of the cumulative hazard function and is given by: \[\hat H(t) = \sum_{t_i leq t}\frac{d_i}{n_i},\]. First, there may be one row of data per subject, with one outcome variable representing the time to event, one variable that codes for whether the event occurred or not (censored), and explanatory variables of interest, each with fixed values across follow up time. The solid lines represent the observed cumulative residuals, while dotted lines represent 20 simulated sets of residuals expected under the null hypothesis that the model is correctly specified. PROC GENMOD can also be used to estimate this odds ratio. Suppose it is of interest to test the null hypothesis that cell means ABC121 and ABC212 are equal that is, H0: 121 - 212 = 0. The result, while not strictly an odds ratio, is useful as a comparison of the odds of treatment A to the "average" odds of the treatments. For each subject, the entirety of follow up time is partitioned into intervals, each defined by a start and stop time. We compare 2 models, one with just a linear effect of bmi and one with both a linear and quadratic effect of bmi (in addition to our other covariates). The ESTIMATE statement provides a mechanism for obtaining custom hypothesis tests. The next two elements are the parameter estimates for the levels of B, 1 and 2. specifies which differences to consider for the level comparisons of a CLASS variable. Most of the variables are at least slightly correlated with the other variables. If we were to plot the estimate of \(S(t)\), we would see that it is a reflection of F(t) (about y=0 and shifted up by 1). At this stage we might be interested in expanding the model with more predictor effects. The (Proportional Hazards Regression) PHREG semi-parametric procedure performs a regression analysis of survival data based on the Cox proportional hazards model. class gender; For the medical example, suppose we are interested in the odds ratio for treatment A versus treatment C in the complicated diagnosis. The GENMOD and GLIMMIX procedures provide separate CONTRAST and ESTIMATE statements. The SAS procedure PROC PHREG allows us to fit a proportional hazard model to a dataset. The EXP option provides the odds ratio estimate by exponentiating the difference. (1994). This convention can affect the way in which you specify the matrix in your CONTRAST statement. The following statements show all five ways of computing and testing this contrast. Instead, we need only assume that whatever the baseline hazard function is, covariate effects multiplicatively shift the hazard function and these multiplicative shifts are constant over time. The following ODDSRATIO statement provides the same estimate of the treatment A vs. treatment C odds ratio in the complicated diagnosis as above (along with odds ratio estimates for the other treatment pairs in that diagnosis). With this simple model, we Deploy software automatically at the click of a button on the Microsoft Azure Marketplace. Comparing Nonnested Models We can estimate the hazard function is SAS as well using proc lifetest: As we have seen before, the hazard appears to be greatest at the beginning of follow-up time and then rapidly declines and finally levels off. Lin, DY, Wei, LJ, Ying, Z. The CONTRAST statement below defines seven rows in L for the seven interaction parameters resulting in a 7 DF test that all interaction parameters are zero. None of the solid blue lines looks particularly aberrant, and all of the supremum tests are non-significant, so we conclude that proportional hazards holds for all of our covariates. The value must be between 0 and 1. The next five elements are the parameter estimates for the levels of A, 1 through 5. I am looking at the interactive effects of X according to Y on death. Similarly, because we included a BMI*BMI interaction term in our model, the BMI term is interpreted as the effect of bmi when bmi is 0. Below we plot survivor curves across several ages for each gender through the follwing steps: As we surmised earlier, the effect of age appears to be more severe in males than in females, reflected by the greater separation between curves in the top graaph. If this option is not specified, PROC PHREG finds all the variables that interact with the variable of interest. While examples in this class provide good examples of the above process for determining coefficients for CONTRAST and ESTIMATE statements, there are other statements available that perform means comparisons more easily. specifies the tolerance for testing the singularity of the Hessian matrix in the computation of the profile-likelihood confidence limits. If the observed pattern differs significantly from the simulated patterns, we reject the null hypothesis that the model is correctly specified, and conclude that the model should be modified. It is not necessary that the larger model be saturated. These provide some statistical background for survival analysis for the interested reader (and for the author of the seminar!). For any of the full-rank parameterizations, if an effect is not specified in the CONTRAST statement, all of its coefficients in the matrix are set to 0. Limitations on constructing valid LR tests. Note that the difference in log odds is equivalent to the log of the odds ratio: So, by exponentiating the estimated difference in log odds, an estimate of the odds ratio is provided. i am trying to run Cox-regression model, so i made this code. where \(d_i\) is the number who failed out of \(n_i\) at risk in interval \(t_i\). Though assisting with the translation of a stated hypothesis into the needed linear combination is beyond the scope of the services that are provided by Technical Support at SAS, we hope that the following discussion and examples will help you. The contrast of the ten LS-means specified in the LSMESTIMATE statement estimates and tests the difference between the AB11 and AB12 LS-means. class gender; To do so: It appears that being in the hospital increases the hazard rate, but this is probably due to the fact that all patients were in the hospital immediately after heart attack, when they presumbly are most vulnerable. The quantity value must be a positive number, with a default value of 1E4. We see a sharper rise in the cumulative hazard right at the beginning of analysis time, reflecting the larger hazard rate during this period. With mixed models fit in PROC MIXED, if the models are nested in the covariance parameters and have identical fixed effects, then a LR test can be constructed using results from REML estimation (the default) or from ML estimation. Previously we suspected that the effect of bmi on the log hazard rate may not be purely linear, so it would be wise to investigate further. 2. Lets interpret our model. The test of the difference is more easily obtained using the LSMESTIMATE statement. Finally, the CONTRAST and ESTIMATE statements use the contrast determined above to compute the AB11 - AB12 difference. We generally expect the hazard rate to change smoothly (if it changes) over time, rather than jump around haphazardly. The covariance matrix of the parameter estimator is computed as a sandwich estimate. Notice that the baseline hazard rate, \(h_0(t)\) is cancelled out, and that the hazard rate does not depend on time \(t\): The hazard rate \(HR\) will thus stay constant over time with fixed covariates. We can see this reflected in the survival function estimate for LENFOL=382. For example, patients in the WHAS500 dataset are in the hospital at the beginnig of follow-up time, which is defined by hospital admission after heart attack. In the code below we fit a Cox regression model where we allow examine the effects of gender, age, bmi, and heart rate on the hazard rate. The default is DIFF=ALL. For such studies, a semi-parametric model, in which we estimate regression parameters as covariate effects but ignore (leave unspecified) the dependence on time, is appropriate. The LSMEANS statement computes the cell means for the 10 A*B cells in this example. ALPHA=number specifies the level of significance for % confidence intervals. run; proc phreg data = whas500; Examples of Writing CONTRAST and ESTIMATE Statements Introduction EXAMPLE 1: A Two-Factor Model with Interaction Computing the Cell Means Using the ESTIMATE Statement Estimating and Testing a Difference of Means A More Complex Contrast Comparing One Interaction Mean to the Average of All Interaction Means The DIFF and SLICEBY(A='1') options in the SLICE statement estimate the differences in LS-means at A=1. The null hypothesis, in terms of model 3e, is: We saw above that the first component of the hypothesis, log(OddsOA) = + d + t1 + g1. If the elements of are not specified for an effect that contains a specified effect, then the elements of the specified effect are distributed over the levels of the higher-order effect just as the GLM procedure does for its CONTRAST and ESTIMATE statements. Biometrics. If too many values are specified for an effect, the extra ones are ignored. run; proc phreg data = whas500(where=(id^=112 and id^=89)); Several covariates can be evaluated simultaneously. Thus, we again feel justified in our choice of modeling a quadratic effect of bmi. Thus, each term in the product is the conditional probability of survival beyond time \(t_i\), meaning the probability of surviving beyond time \(t_i\), given the subject has survived up to time \(t_i\). PROC GENMOD produces the Wald statistic when the WALD option is used in the CONTRAST statement. Therneau, TM, Grambsch PM, Fleming TR (1990). As an example, imagine subject 1 in the table above, who died at 2,178 days, was in a treatment group of interest for the first 100 days after hospital admission. In PROC LOGISTIC, odds ratio estimates for variables involved in interactions can be most easily obtained using the ODDSRATIO statement. As we see above, one of the great advantages of the Cox model is that estimating predictor effects does not depend on making assumptions about the form of the baseline hazard function, \(h_0(t)\), which can be left unspecified. In other words, the average of the Schoenfeld residuals for coefficient \(p\) at time \(k\) estimates the change in the coefficient at time \(k\). /*class exposure*/model period*outcome(0)=exposure / rl;run; Hello@MTeckand welcome to the SAS Support Communities! Here, we would like to introdue two types of interaction: We would probably prefer this model to the simpler model with just gender and age as explanatory factors for a couple of reasons. Indicator or dummy coding of a predictor replaces the actual variable in the design matrix (or model matrix) with a set of variables that use values of 0 or 1 to indicate the level of the original variable. Thus, it might be easier to think of \(df\beta_j\) as the effect of including observation \(j\) on the the coefficient. Institute for Digital Research and Education. Now lets look at the model with just both linear and quadratic effects for bmi. We request Cox regression through proc phreg in SAS. proc sgplot data = dfbeta; All of these variables vary quite a bit in these data. The model is the same as model (1) above with just a change in the subscript ranges. By default, value is the machine epsilon times 1E7, which is approximately 1E9. Computed statistics are based on the asymptotic chi-square distribution of the Wald statistic. If too few values are specified, the remaining ones are set to 0. This coding scheme is used by default by PROC CATMOD and PROC LOGISTIC and can be specified in these and some other procedures such as PROC GENMOD with the PARAM=EFFECT option in the CLASS statement. \[df\beta_j \approx \hat{\beta} \hat{\beta_j}\]. run; proc phreg data=whas500 plots=survival; In an example from Ries and Smith (1963), the choice of detergent brand (Brand= M or X) is related to three other categorical variables: the softness of the laundry water (Softness= soft, medium, or hard); the temperature of the water (Temperature= high or low); and whether the subject was a previous user of Brand M (Previous= yes or no). The SLICE and LSMEANS statements cannot be used for this more complex contrast. INTRODUCTION The PROC LIFEREG and the PROC PHREG procedures both can do survival analysis using time-to-event data, . Construction and Computation of Estimable Functions, Specifies a list of values to divide the coefficients, Suppresses the automatic fill-in of coefficients for higher-order effects, Tunes the estimability checking difference, Determines the method for multiple comparison adjustment of estimates, Performs one-sided, lower-tailed inference, Adjusts multiplicity-corrected p-values further in a step-down fashion, Specifies values under the null hypothesis for tests, Performs one-sided, upper-tailed inference, Displays the correlation matrix of estimates, Displays the covariance matrix of estimates, Produces a joint or chi-square test for the estimable functions, Requests ODS statistical graphics if the analysis is sampling-based, Specifies the seed for computations that depend on random numbers. For a more detailed definition of nested and nonnested models, see the Clarke (2001) reference cited in the sample program. However, this is something that cannot be estimated with the ODDSRATIO statement which only compares odds of levels of a specified variable. Indeed, exclusion of these two outliers causes an almost doubling of \(\hat{\beta}_{bmi}\), from -0.23323 to -0.39619. Notice that the difference in log odds for these two cells (1.02450 0.39087 = 0.63363) is the same as the log odds ratio estimate that is provided by the CONTRAST statement. This indicates that our choice of modeling a linear and quadratic effect of bmi was a reasonable one. This seminar introduces procedures and outlines the coding needed in SAS to model survival data through both of these methods, as well as many techniques to evaluate and possibly improve the model. The EXP option exponentiates each difference providing odds ratio estimates for each pair. Most of the time we will not know a priori the distribution generating our observed survival times, but we can get and idea of what it looks like using nonparametric methods in SAS with proc univariate. Grambsch, PM, Therneau, TM, Fleming TR. These results come from the LSMESTIMATE statement. The Cox model contains no explicit intercept parameter, so it is not valid to specify one in the CONTRAST statement. Suppose that you suspect that the survival function is not the same among some of the groups in your study (some groups tend to fail more quickly than others). Thus, by 200 days, a patient has accumulated quite a bit of risk, which accumulates more slowly after this point. format gender gender. Computing the Cell Means Using the ESTIMATE Statement Finally, writing the hypothesis 12 1/6ijij in terms of the model results in these contrast coefficients: 0 for , 1/2 and 1/2 for A, 1/3, 2/3, and 1/3 for B, and 1/6, 5/6, 1/6, 1/6, 1/6, and 1/6 for AB. In PROC LOGISTIC, the ESTIMATE=BOTH option in the CONTRAST statement requests estimates of both the contrast (difference in log odds or log odds ratio) and the exponentiated contrast (odds ratio). Then there are three parameters () representing the first three levels, and the fourth parameter is represented by, To test the first versus the fourth level of A, you would test. Grambsch and Therneau (1994) show that a scaled version of the Schoenfeld residual at time \(k\) for a particular covariate \(p\) will approximate the change in the regression coefficient at time \(k\): \[E(s^\star_{kp}) + \hat{\beta}_p \approx \beta_j(t_k)\]. The value number must be between 0 and 1; the default value is 0.05, which results in 95% intervals. So the log odds are: For treatment C in the complicated diagnosis, O = 1, A = 1, B = 1. run; proc lifetest data=whas500 atrisk nelson; Not only are we interested in how influential observations affect coefficients, we are interested in how they affect the model as a whole. Firths Correction for Monotone Likelihood, Conditional Logistic Regression for m:n Matching, Model Using Time-Dependent Explanatory Variables, Time-Dependent Repeated Measurements of a Covariate, Survivor Function Estimates for Specific Covariate Values, Model Assessment Using Cumulative Sums of Martingale Residuals, Bayesian Analysis of Piecewise Exponential Model. However, each of the other 3 at the higher smoothing parameter values have very similar shapes, which appears to be a linear effect of bmi that flattens as bmi increases. In other words, we would expect to find a lot of failure times in a given time interval if 1) the hazard rate is high and 2) there are still a lot of subjects at-risk. Find more tutorials on the SAS Users YouTube channel. Beside using the solution option to get the parameter estimates, Nonparametric methods provide simple and quick looks at the survival experience, and the Cox proportional hazards regression model remains the dominant analysis method. 1 Answer Sorted by: 3 I'm not into statistics, so I'm just guessing what value you mean - here's an example I think could help you: ods trace on; ods output ParameterEstimates=work.my_estimates_dataset; proc phreg data=sashelp.class; model age = height; run; ods trace off; This is using SAS Output Delivery System component of SAS/Base. EXAMPLE 2: A Three-Factor Model with Interactions Here is the syntax for CONTRAST statement. Can i add class statement to want to see hazard ratios on exposure. The second model is a reduced model that contains only the main effects. The following examples concentrate on using the steps above in this situation. The E option, described later in this section, enables you to verify the proper correspondence of values to parameters. In each of the graphs above, a covariate is plotted against cumulative martingale residuals. o1LSRD"Qh&3[F&g w/!|#+QnHA8Oy9 , 515-526. However, if the nested models do not have identical fixed effects, then results from ML estimation must be used to construct a LR test. Copyright It is possible that the relationship with time is not linear, so we should check other functional forms of time, such as log(time) and rank(time). run; output out=residuals resmart=martingale; Hazard ratios are computed at each value of the list if the list is specified, or at each level of the interacting variable if ALL is specified, or at the reference level of the interacting variable if REF is specified. This is the default coding scheme for CLASS variables in most procedures including GLM, MIXED, GLIMMIX, and GENMOD. Any estimable linear combination of model parameters can be tested using the procedure's CONTRAST statement. This can be particularly difficult with dummy (PARAM=GLM) coding. You can estimate the contrast or the exponentiated contrast (), or both, by specifying one of the following keywords: specifies that the contrast itself be estimated. One can request that SAS estimate the survival function by exponentiating the negative of the Nelson-Aalen estimator, also known as the Breslow estimator, rather than by the Kaplan-Meier estimator through the method=breslow option on the proc lifetest statement. The calculation of the statistic for the nonparametric Log-Rank and Wilcoxon tests is given by : \[Q = \frac{\bigg[\sum\limits_{i=1}^m w_j(d_{ij}-\hat e_{ij})\bigg]^2}{\sum\limits_{i=1}^m w_j^2\hat v_{ij}},\]. Summing over the entire interval, then, we would expect to observe \(x\) failures, as \(\frac{x}{t}t = x\), (assuming repeated failures are possible, such that failing does not remove one from observation). Next, we illustrate the combination of these statements by following two examples. The null distribution of the cumulative martingale residuals can be simulated through zero-mean Gaussian processes. Cox models are typically fitted by maximum likelihood methods, which estimate the regression parameters that maximize the probability of observing the given set of survival times. The primary focus of survival analysis is typically to model the hazard rate, which has the following relationship with the \(f(t)\) and \(S(t)\): The hazard function, then, describes the relative likelihood of the event occurring at time \(t\) (\(f(t)\)), conditional on the subjects survival up to that time \(t\) (\(S(t)\)). The parameter for ses1 is the difference The value that you specify in the option divides all the coefficients that are provided in the ESTIMATE statement. However, coefficients for the B effect remain in addition to coefficients for the A*B interaction effect. Create a variable called CENSOR. variable for ses =2. Graphs are particularly useful for interpreting interactions. In PROC LOGISTIC, use the PARAM=GLM option in the CLASS statement to request dummy coding of CLASS variables. To get the expected mean The rows of are specified in order and are separated by commas. It is available only for the Bayesian analysis. With any procedure, models that are not nested cannot be compared using the LR test. Instead, you model a function of the response distribution's mean. The survival function is undefined past this final interval at 2358 days. Example 3: using the CONTRAST statement to do comparison: When we set the reference levels to be REF='NEV' for TOBHX and REF='GP' for RND, we need to manually set the contrast parameters for each comparison in the CONTRAST statement. The following parameters are specified in the CONTRAST statement: identifies the contrast on the output. These statements include the LSMEANS, LSMESTIMATE, and SLICE statements that are available in many procedures. The contrast estimate is exponentiated to yield the odds ratio estimate. If PROC PHREG finds a contrast to be nonestimable, it displays missing values in corresponding rows in the results. As an example, suppose that you intend to use PROC REG to perform a linear regression, and you want to capture the R-square value in a SAS data set. SAS Code from All of These Examples. However, it can happen (and it did in your example) that the CLASS statement uses level '1' of that explanatory variable as the reference level so that the sign of the corresponding parameter estimate changes and the inverse hazard ratio and confidence limits are computed,here: the hazard ratio of "no exposure" vs. Since treatment A and treatment C are the first and third in the LSMEANS list, the contrast in the LSMESTIMATE statement estimates and tests their difference.