Monday, December 22, 2008

R tips 2

When you attach a data frame, let's call 'dat', you will have two copies of 'dat', one in usual R memory that can be listed by ls() and another in a special memory space that can be viewed by search(). In a 'dat' data frame which contains x and y where y = x^2, with attach() function you can plot y against x without calling the data frame name. Now create a new variable z = exp(x) while x is a variable in 'dat', but where is z, in 'dat' or elsewhere? The answer is that z is a stand alone vector outside 'dat' data frame, neither in the 'dat' data frame in usual R memory nor the one that can be searched. You can force z to be a variable in 'dat' by specifying dat$z, but where is z?

attach(dat)
plot(y,x)
z <- exp(x)
dat$z <- exp(x)
plot(z,x)
detach(dat)

I introduce transform() function in the previous tips. Here I will introduce another useful function with(). Instead of attaching a data frame, you can manipulate a variable name in a data frame with with() function. Look at the following modification of the codes above.

with(dat,plot(y,x))
dat <- transform(dat,z=exp(x))
with(dat,plot(z,x))

Notice that with() lets you work with variables in a data frame as transform() does. And now you know where z is and you don't forget to detach anything because you never attach an object.

Sunday, December 14, 2008

Poisson regression, 2009

Hi,

These are materials for Poisson regression course, 2009. This year I use R version 2.10.0 and some regression outputs are slightly different from the 2007 version where R version 2.6.2 was used. However, the different in glm() output at the 3 or later digits after the decimal point is very usual when you change R version. This is a technical issue, not a bug, of the maximum likelihood estimation method. None of the results is wrong. It means that you should not pay attention too much on later digits after the second decimal points.

If you are eager to know why the maximum likelihood estimation method is that tricky, please read some textbooks specific to this matter.

montana.dat
montana.dta
docsmoke.dat
docsmoke.dta
welsh.dat
welsh.dta

The functions for Poisson regression can be downloaded here.
poissonfun.R

The module can be downloaded here.

Poisson0812.pdf

You can also download ICE modules from ice and epid. However, they are not necessary for this session.

Saturday, December 13, 2008

R tips 1

1. You can call an R library with library() function. Do you know that require() also work similarly and it give you a logical return value?

2. There are some different functions in different library with the same name. You can call one in a library, and you can call another in another library by calling the library name where it is before calling the function. For example,

library(epicalc)
summ(xxx)
# Calling summ() in epicalc library.
library(ice)
summ(xxx)
# Calling summ() in ice library.

3. There are at least two ways to work with a vector in a data frame, first by using $ and the second by using transform() function. See example below.

dat$var2 <- as.numeric(dat$var1 != 0)
# Forcing 0 in var1 being 0 and others being 1
dat <- transform(dat, var2 = as.numeric(var1 != 0))
# Remember this function transforms a data frame, not a variable in a data frame.

Friday, December 12, 2008

Logistic regression course II, 2010, Conditional logistic regression

Hi,

These are materials for logistic regression course II: Conditional logistic regression, 2010. This year I use R version 2.12.0 and the regression outputs are not different from the 2009 version where R version 2.10.0 was used. However, the different in glm() output at the 3 or later digits after the decimal point is very usual when you change R version. This is a technical issue, not a bug, of the maximum likelihood estimation method. None of the results is wrong. It means that you should not pay attention too much on later digits after the second decimal points.

If you are eager to know why the maximum likelihood estimation method is that tricky, please read some textbooks specific to this matter.

Conditional logistic regression requires library survival that already exists on your R library folder. Just call library(survival) or require(survival) and you are ready to use clogit function.

agechd.dta
cca-match.dta

The module can be downloaded here.

logistic1012-2.pdf

You can also download ICE modules from
r-ice.project.net

The required modules for this course are ice and epid.

Finally, you can follow my R script file below. Don't forget to change working directory to yours.

conditional.R

Logistic regression course I, 2010

Hi,

These are materials for logistic regression course I, 2010. This year I use R version 2.12.0 and the regression outputs are not different from the 2009 version where R version 2.10.0 was used. However, the different in glm() output at the 3 or later digits after the decimal point is very usual when you change R version. This is a technical issue, not a bug, of the maximum likelihood estimation method. None of the results is wrong. It means that you should not pay attention too much on later digits after the second decimal points.

If you are eager to know why the maximum likelihood estimation method is that tricky, please read some textbooks specific to this matter.

anc.dta
agechd.dta
cca.dta
lowbwt.dta

The revised module can be downloaded here.

logistic1012-1.pdf

You can also download ICE modules from

www.r-ice-project.net

The required modules for this course are ice and epid.

Finally, you can try following my R script file. Please change the working directory to yours, otherwise, the script will not run correctly.

exercises.R