One of my goals this summer is to familiarize myself with the concepts and methods of survival analysis. Not only will this add another “tool” to my epidemiological “toolbox”, but it will help me prepare for the comprehensive examination this August. Each week I intend to read and blog about one chapter of the text, *Survival Analysis: A Self-Learning Text, Second Edition* by David G. Kleinbaum and Mitchel Klein. Again, I will only want to highlight the main points, especially the important concepts to be covered on the comps.

**Chapter 1 – Introduction to Survival Analysis**

- The problem that survival analysis solves is analyzing
*the time to an event*as the main outcome. - There is
**time**, which can be continuous time or discrete time, such as age. - There is the
**event**, which is some outcome of interest such as death, onset of morbidity, or remission. Usually the event is a dichotomous variable. - When more than one outcome event is of interest, then a more complicated
**competing risk**is needed. - The event is also sometimes called the
**failure**. **Censoring**is an important concept in survival analysis. It is when an individual’s survival time is unknown due to the failure event not being observed. This can be due to the individual not experiencing the event before the end of the study, withdrawal from the study, or the person is lost to follow-up.- This type of censoring is
**right-censored**.

- Survival analysis can be described mathematically in relation to two related functions: the
**survival function**and the**hazard function.** - The
**survival function**,**S(t)**, is the probability that a person’s survival time (T, a random variable) is greater than some specified time (t). S(t) = P(T>t) - There are some important properties of the survival function that should be noted:
- 1) Since it is a probability, S(t) has a range between 0 ant 1.
- 2) S(t) is always non-increasing. At time 0 the S(t) is 1 and decreases as time increases. At infinity, S(t) is zero.
- In theory, S(t) represents a
*smooth curve*from 1 to 0, but in reality it is a**step function**that may not reach zero if the study ends before all subjects have experienced the event. - The other important function is the
**hazard function**,**h(t)**, which represents the instantaneous risk per unit of time of the event occurring given survival to some time (t).

- The key points about the hazard function is that it is a
*rate*and not a probability, so it can take values between 0 and infinity. - When the hazard function is constant (a straight line), then it is called
**exponential.** - The survival and hazard functions are mathematically related, such that knowing the survival function allows one to derive the hazard function, and vice verse (see equations below).

- There are three main
**goals of survival analysis:** - Estimate and interpret the survival and/or hazard function for survival data.
- Compare survival/hazard functions.
- Assess the relationship between explanatory variables and survivor time.
- One way of laying out survival data is to assign each row to an individual observation with each column variable subscripted by subject with the following column variables: survival time (t), failure status (whether the subject experienced the event, d), and an array of explanatory variables (X). This is likely the form of the data needed for analysis by computer.
- Another data layout that facilitates survival analysis arranges the data where each row of the data exists for each unique value of uncensored survival time, beginning with t=0 and in ascending order. For each row, there is a column for survival time, # of failures between prior survival time up to and including that row’s survival time, # of censored observations between prior survival time up to and including that row’s survival time, and the total obs in the risk set.

- The
**average survival time**(T-bar) is simply the sum of the survival times for all subjects (censored and non-censored) divided by the number of subjects. - The
**average hazard rate**(h-bar) is the number of failures divided by the sum of the survival times (censored and non-censored). **Median survival time**may also be a useful descriptive statistic of survival curves. It is the time (t) at which 50% of the subjects have still survived.