# Survival Analysis – Chapter 1 – Introduction to Survival Analysis

One of my goals this summer is to familiarize myself with the concepts and methods of survival analysis. Not only will this add another “tool” to my epidemiological “toolbox”, but it will help me prepare for the comprehensive examination this August. Each week I intend to read and blog about one chapter of the text, Survival Analysis: A Self-Learning Text, Second Edition by David G. Kleinbaum and Mitchel Klein. Again, I will only want to highlight the main points, especially the important concepts to be covered on the comps.

Chapter 1 – Introduction to Survival Analysis

• The problem that survival analysis solves is analyzing the time to an event as the main outcome.
• There is time, which can be continuous time or discrete time, such as age.
• There is the event, which is some outcome of interest such as death, onset of morbidity, or remission. Usually the event is a dichotomous variable.
• When more than one outcome event is of interest, then a more complicated competing risk is needed.
• The event is also sometimes called the failure.
• Censoring is an important concept in survival analysis. It is when an individual’s survival time is unknown due to the failure event not being observed. This can be due to the individual not experiencing the event before the end of the study, withdrawal from the study, or the person is lost to follow-up.
• This type of censoring is right-censored.

• Survival analysis can be described mathematically in relation to two related functions: the survival function and the hazard function.
• The survival function, S(t), is the probability that a person’s survival time (T, a random variable) is greater than some specified time (t). S(t) = P(T>t)
• There are some important properties of the survival function that should be noted:
• 1) Since it is a probability, S(t) has a range between 0 ant 1.
• 2) S(t) is always non-increasing. At time 0 the S(t) is 1 and decreases as time increases. At infinity, S(t) is zero.
• In theory, S(t) represents a smooth curve from 1 to 0, but in reality it is a step function that may not reach zero if the study ends before all subjects have experienced the event.
• The other important function is the hazard function, h(t), which represents the instantaneous risk per unit of time of the event occurring given survival to some time (t).
• The key points about the hazard function is that it is a rate and not a probability, so it can take values between 0 and infinity.
• When the hazard function is constant (a straight line), then it is called exponential.
• The survival and hazard functions are mathematically related, such that knowing the survival function allows one to derive the hazard function, and vice verse (see equations below).
• There are three main goals of survival analysis:
1. Estimate and interpret the survival and/or hazard function for survival data.
2. Compare survival/hazard functions.
3. Assess the relationship between explanatory variables and survivor time.
• One way of laying out survival data is to assign each row to an individual observation with each column variable subscripted by subject with the following column variables: survival time (t), failure status (whether the subject experienced the event, d), and an array of explanatory variables (X). This is likely the form of the data needed for analysis by computer.
• Another data layout that facilitates survival analysis arranges the data where each row of the data exists for each unique value of uncensored survival time, beginning with t=0 and in ascending order. For each row, there is a column for survival time, # of failures between prior survival time up to and including that row’s survival time, # of censored observations between prior survival time up to and including that row’s survival time, and the total obs in the risk set.
• The average survival time (T-bar) is simply the sum of the survival times for all subjects (censored and non-censored) divided by the number of subjects.
• The average hazard rate (h-bar) is the number of failures divided by the sum of the survival times (censored and non-censored).
• Median survival time may also be a useful descriptive statistic of survival curves. It is the time (t) at which 50% of the subjects have still survived.