Factors

Factors are special vectors used primarily for nominal and ordinal variables in the social sciences. Factors contain levels that include the description of the values (referred to as value labels in SPSS). The essential difference from character vectors can be quickly identified using the as.factor() function and the char object.

as.factor(char)
## [1] Taipeh Seoul  Berlin Taipeh
## Levels: Berlin Seoul Taipeh

We not only get the stored cities, but also a second line starting with Levels. Levels are unique values present in the vector. In this case, there are only three, as the city of Taipei was mentioned by two respondents. This R-specific data type will be useful later for comparing groups or for graphical representation.

You have made a change to the char object, but you have not saved this change. To save the change, you need to assign this step to an object:

charFactor <- as.factor(char)

We have now created a factor in the object charFactor. In the environment, we can see the following:

Factor in the Environment
Factor in the Environment

The property now indicates that it is a factor with three different levels. In the vector itself, numerical values are used that are assigned to the respective levels. The order is based on the listing of the levels: Berlin has the value \(1\), Seoul has the value \(2\), and Taipei has \(3\). This type is often used for variables that have nominal or ordinal scale levels. Important: A factor itself does not imply any order between the values, but only contains the differences. Strictly speaking, this type should only be used for nominal variables.

However, we can also specify an order within a factor, making it correspond to an ordinal scale level. For example, we can create an object that includes the grading in the American system for six people. The values are: \(A\), \(C\), \(D\), \(B\), \(C\), and $A. As a reminder, in the American system, \(A\) is the best grade and \(D\) is the worst grade. The order of the values is as follows: \(D < C < B < A\).

To do this, we first create an object that includes the six values. We use the function c() for this.

grade <- c(
  "A",
  "C",
  "D",
  "B",
  "C", 
  "A"
)

grade
## [1] "A" "C" "D" "B" "C" "A"

Then we use the function factor() and create an ordered factor. The factor() function requires three arguments for this. In the first argument, we enter the data that should be ordered. In the second argument, we specify ordered = TRUE to indicate that not only a factor should be created but also an order should be considered. In the third argument, we specify the actual order in levels = ... (ascending from \(D\) to $A`).

gradeOrd <- factor(
  grade,  # 1. Argument: Dateninput
  ordered = TRUE,  # 2. Argument: Setz eine Ordnung
  levels = c(    # 3. Argument: Angabe, wie geordnet werden soll
    "D",
    "C",
    "B",
    "A"
  )
)

gradeOrd
## [1] A C D B C A
## Levels: D < C < B < A

In the environment, we then see the property Ord. factor and again the specification of the levels and their order.

Ordered Factor in Environment
Ordered Factor in Environment

As before, two pieces of information are stored: once the levels with the grades and then numerical values corresponding to the levels. The numerical values are assigned again in the order of the levels: So \(D\) gets the value \(1\), \(C\) gets the value \(2\), \(B\) gets the value \(3\), and finally \(A\) gets the value \(4\).

It remains exciting as we move on to datasets.