Introduction to Programming
We look at running commands from a source file. We also include an
overview of the different statements that are used for control-flow
that determines which code is executed by the interpreter.
In the next section the ways to execute the commands in a file using
the source command are given. The remaining sections are used to
list the various flow control options that are available in the R
language definition. The language definition has a wide variety of
control functions which can be found using the help command.
> help(Control)
>
A set of R commands can be saved in a file and then executed as if you
had typed them in from the command line. The source command is used
to read the file and execute the commands in the same sequence given
in the file.
> source('file.R')
> help(source)
>
If you simply source the file the commands are not printed, and the
results of commands are not printed. This can be overridden using the
echo, print.eval, and verbose options.
Some examples are given assuming that a file, simpleEx.R, is in the
current directory. The file is given below:
# Define a variable.
x <- rnorm(10)
# calculate the mean of x and print out the results.
mux = mean(x)
cat("The mean of x is ",mean(x),"\n")
# print out a summary of the results
summary(x)
cat("The summary of x is \n",summary(x),"\n")
print(summary(x))
The file also demonstrates the use of # to specify
comments. Anything after the # is ignored. Also, the file
demonstrates the use of cat and print to send results to the
standard output. Note that the commands have options to send results
to a file. Use help for more information.
The output for the different options can be found below:
> source('simpleEx.R')
The mean of x is  -0.4817475
The summary of x is
-2.24 -0.5342 -0.2862 -0.4817 -0.1973 0.4259
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -2.2400 -0.5342 -0.2862 -0.4817 -0.1973  0.4259
>
>
>
> source('simpleEx.R',echo=TRUE)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-2.32600 -0.69140 -0.06772 -0.13540  0.46820  1.69600
>
>
>
> source('simpleEx.R',print.eval=TRUE)
The mean of x is  0.1230581
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.7020 -0.2833  0.1174  0.1231  0.9103  1.2220
The summary of x is
-1.702 -0.2833 0.1174 0.1231 0.9103 1.222
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.7020 -0.2833  0.1174  0.1231  0.9103  1.2220
>
>
>
> source('simpleEx.R',print.eval=FALSE)
The mean of x is  0.6279428
The summary of x is
-0.7334 -0.164 0.9335 0.6279 1.23 1.604
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.7334 -0.1640  0.9335  0.6279  1.2300  1.6040
>
>
>
>
> source('simpleEx.R',verbose=TRUE)
'envir' chosen:<environment: R_GlobalEnv>
encoding = "native.enc" chosen
--> parsed 6 expressions; now eval(.)ing them:
>>>> eval(expression_nr. 1 )
                 =================
> # Define a variable.
> x <- rnorm(10)
curr.fun: symbol <-
 .. after ‘expression(x <- rnorm(10))’
>>>> eval(expression_nr. 2 )
                 =================
> # calculate the mean of x and print out the results.
> mux = mean(x)
curr.fun: symbol =
 .. after ‘expression(mux = mean(x))’
>>>> eval(expression_nr. 3 )
                 =================
> cat("The mean of x is ",mean(x),"\n")
The mean of x is  -0.1090932
curr.fun: symbol cat
 .. after ‘expression(cat("The mean of x is ",mean(x),"\n"))’
>>>> eval(expression_nr. 4 )
                 =================
> # print out a summary of the results
> summary(x)
curr.fun: symbol summary
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.3820 -1.0550 -0.1995 -0.1091  0.6813  2.1050
 .. after ‘expression(summary(x))’
>>>> eval(expression_nr. 5 )
                 =================
> cat("The summary of x is \n",summary(x),"\n")
The summary of x is
 -1.382 -1.055 -0.1995 -0.1091 0.6813 2.105
curr.fun: symbol cat
 .. after ‘expression(cat("The summary of x is \n",summary(x),"\n"))’
>>>> eval(expression_nr. 6 )
                 =================
> print(summary(x))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.3820 -1.0550 -0.1995 -0.1091  0.6813  2.1050
curr.fun: symbol print
 .. after ‘expression(print(summary(x)))’
 
 
One common problem that occurs is that R may not know where to find a
file.
> source('notThere.R')
Error in file(filename, "r", encoding = encoding) :
  cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
  cannot open file 'notThere.R': No such file or directory
R will search the current working directory. You can see what files
are in the directory using the dir command, and you can determine
the current directory using the getwd command.
> getwd()
[1] "/home/black/public_html/tutorial/R/rst/source/R"
> dir()
[1] "plotting.rData" "power.R"        "shadedRegion.R"
You can change the current directory, and the options available depend
on how you are using R. For example on a Windows PC or a Macintosh you
can use the menu options to change the working directory. You can
choose the directory using a graphical file browser. Otherwise, you
can change to the correct directory before running R or use the
setwd command.
 
Conditional execution is available using the if statement and the
corresponding else statement.
> x = 0.1
> if( x < 0.2)
  {
     x <- x + 1
     cat("increment that number!\n")
  }
increment that number!
> x
[1] 1.1
 
 
The else statement can be used to specify an alternate option. In the
example below note that the else statement must be on the same line
as the ending brace for the previous if block.
> x = 2.0
> if ( x < 0.2)
 {
    x <- x + 1
    cat("increment that number!\n")
 } else
 {
    x <- x - 1
    cat("nah, make it smaller.\n");
 }
nah, make it smaller.
> x
[1] 1
 
 
Finally, the if statements can be chained together for multiple
options. The if statement is considered a single code block, so more
if statements can be added after the else.
> x = 1.0
> if ( x < 0.2)
 {
    x <- x + 1
    cat("increment that number!\n")
 } else if ( x < 2.0)
 {
   x <- 2.0*x
   cat("not big enough!\n")
 } else
 {
    x <- x - 1
    cat("nah, make it smaller.\n");
 }
not big enough!
> x
[1] 2
The argument to the if statement is a logical expression. A full
list of logical operators can be found in the types document focusing
on logical variables (Logical).
 
The for loop can be used to repeat a set of instructions, and it is
used when you know in advance the values that the loop variable will
have each time it goes through the loop. The basic format for the
for loop is for(var in seq) expr
An example is given below:
> for (lupe in seq(0,1,by=0.3))
 {
    cat(lupe,"\n");
 }
0
0.3
0.6
0.9
>
> x <- c(1,2,4,8,16)
> for (loop in x)
 {
    cat("value of loop: ",loop,"\n");
 }
value of loop:  1
value of loop:  2
value of loop:  4
value of loop:  8
value of loop:  16
See the section on breaks for more options (break and next statements)
 
The while loop can be used to repeat a set of instructions, and it
is often used when you do not know in advance how often the
instructions will be executed. The basic format for a while loop is
while(cond) expr
>
> lupe <- 1;
> x <- 1
> while(x < 4)
 {
    x <- rnorm(1,mean=2,sd=3)
    cat("trying this value: ",x," (",lupe," times in loop)\n");
    lupe <- lupe + 1
 }
trying this value:  -4.163169  ( 1  times in loop)
trying this value:  3.061946  ( 2  times in loop)
trying this value:  2.10693  ( 3  times in loop)
trying this value:  -2.06527  ( 4  times in loop)
trying this value:  0.8873237  ( 5  times in loop)
trying this value:  3.145076  ( 6  times in loop)
trying this value:  4.504809  ( 7  times in loop)
See the section on breaks for more options (break and next statements)
 
The repeat loop is similar to the while loop. The difference is
that it will always begin the loop the first time. The while loop
will only start the loop if the condition is true the first time it is
evaluated. Another difference is that you have to explicitly specify
when to stop the loop using the break command.
That is you need to execute the break statement to get out of the
loop.
>  repeat
{
    x <- rnorm(1)
    if(x < -2.0) break
}
> x
[1] -2.300532
See the section on breaks for more options (break and next statements)
 
The break statement is used to stop the execution of the current
loop. It will break out of the current loop. The next statement is
used to skip the statements that follow and restart the current
loop. If a for loop is used then the next statement will update
the loop variable.
> x <- rnorm(5)
> x
[1]  1.41699338  2.28086759 -0.01571884  0.56578443  0.60400784
> for(lupe in x)
 {
     if (lupe > 2.0)
         next
     if( (lupe<0.6) && (lupe > 0.5))
        break
    cat("The value of lupe is ",lupe,"\n");
 }
The value of lupe is  1.416993
The value of lupe is  -0.01571884
 
The switch takes an expression and returns a value in a list based
on the value of the expression. How it does this depends on the data
type of the expression. The basic syntax is switch(statement,item1,item2,item3,...,itemN).
If the result of the expression is a number then it returns the item
in the list with the same index. Note that the expression is cast as
an integer if it is not an integer.
> x <- as.integer(2)
> x
[1] 2
> z = switch(x,1,2,3,4,5)
> z
[1] 2
> x <- 3.5
> z = switch(x,1,2,3,4,5)
> z
[1] 3
If the result of the expression is a string, then the list of items
should be in the form “valueN”=resultN, and the statement will
return the result that matches the value.
> y <- rnorm(5)
> y
[1]  0.4218635 -0.8205637 -1.0191267 -0.6080061 -0.6079133
> x <- "sd"
> z <- switch(x,"mean"=mean(y),"median"=median(y),"variance"=var(y),"sd"=sd(y))
> z
[1] 0.5571847
> x <- "median"
> z <- switch(x,"mean"=mean(y),"median"=median(y),"variance"=var(y),"sd"=sd(y))
> z
[1] -0.6080061
 
The command to read input from the keyboard is the scan
statement. It has a wide variety of options and can be fine tuned to
your specific needs. We only look at the basics here. The scan
statement waits for input from a user, and it returns the value that
was typed in.
When using the command with no set number of lines the command will
continue to read keyboard input until a blank line is entered.
> help(scan)
> a <- scan(what=double(0))
1: 3.5
2:
Read 1 item
> a
[1] 3.5
> typeof(a)
[1] "double"
>
> a <- scan(what=double(0))
1: yo!
1:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
scan() expected 'a real', got 'yo!'
 
 
A shallow overview of defining functions is given here. A few
subtleties will be noted, but R can be a little quirky with respect to
defining functions. The first bit of oddness is that you can think of
a function as an object where you define the function and assign it to
a variable name.
To define a function you assign it to a name, and the keyword
function is used to denote the start of the function and its
argument list.
> newDef <- function(a,b)
 {
     x = runif(10,a,b)
     mean(x)
 }
> newDef(-1,1)
[1] 0.06177728
> newDef
function(a,b)
{
   x = runif(10,a,b)
   mean(x)
}
The last expression in the function is what is returned. So in the
example above the sample mean of the numbers is returned.
> x <- newDef(0,1)
> x
[1] 0.4800442
The arguments that are passed are matched in order. They can be
specified explicitly, though.
> newDef(b=10,a=1)
[1] 4.747509
> newDef(10,1)
[1] NaN
Warning message:
In runif(10, a, b) : NAs produced
You can mix this approach, and R will try to match up the named
arguments and then match the rest going from left to right. Another
bit of weirdness is that R will not evaluate an expression in the
argument list until the moment it is needed in the function. This is a
different kind of behavior than what most people are used to, so be
very careful about this. The best rule of thumb is to not put in
operations in an argument list if they matter after the function is
called.
Another common task is to have a function return multiple items. This
can be accomplished by returning a list of items. The objects within
a list can be accessed using the same $ notation that is used for
data frames.
> c = c(1,2,3,4,5)
> sample <- function(a,b)
{
  value = switch(a,"median"=median(b),"mean"=mean(b),"variance"=var(b))
  largeVals = length(c[c>1])
  list(stat=value,number=largeVals)
}
> result <- sample("median",c)
> result
$stat
[1] 3
$number
[1] 4
> result$stat
[1] 3
> result$number
[1] 4
There is another potential problem that can occur when using a
function in R. When it comes to determining the value of a variable
there is a path that R will use to search for its value. In the case
of functions if a previously undefined variable appears R will look at
the argument list for the function. Next it will look in the current
work space. If you are not careful R will find the value some place
where you do not expect it, and your function will return a value that
is not correct, and no error will be given. Be very careful about the
names of variables especially when using functions.