<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2222-3436</journal-id>
<journal-title><![CDATA[South African Journal of Economic and Management Sciences ]]></journal-title>
<abbrev-journal-title><![CDATA[S. Afr. j. econ. manag. sci. (Online)]]></abbrev-journal-title>
<issn>2222-3436</issn>
<publisher>
<publisher-name><![CDATA[University of Pretoria]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2222-34362012000100004</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[On employees' performance appraisal: the impact and treatment of the raters' effect]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Zewotir]]></surname>
<given-names><![CDATA[Temesgen]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,University of KwaZulu-Natal School of Mathematics, Statistics and Computer Science ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>00</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>00</month>
<year>2012</year>
</pub-date>
<volume>15</volume>
<numero>1</numero>
<fpage>44</fpage>
<lpage>54</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.za/scielo.php?script=sci_arttext&amp;pid=S2222-34362012000100004&amp;lng=en&amp;nrm=iso&amp;tlng=en"></self-uri><self-uri xlink:href="http://www.scielo.org.za/scielo.php?script=sci_abstract&amp;pid=S2222-34362012000100004&amp;lng=en&amp;nrm=iso&amp;tlng=en"></self-uri><self-uri xlink:href="http://www.scielo.org.za/scielo.php?script=sci_pdf&amp;pid=S2222-34362012000100004&amp;lng=en&amp;nrm=iso&amp;tlng=en"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[By putting in place a performance appraisal scheme, employees who improve their work efficiency can then be rewarded, whereas corrective action can be taken against those who don't. The aim of this paper is to develop a technique that helps to measure the subjective effect that a given rater's assessment will have on the performance appraisal of a given employee, assuming that an assessment of one's work performance will have to be undertaken by a rater and that this rating is essentially a subjective one. In particular, a linear mixed modelling approach will be applied to data that comes from a South African company which has 214 employees and where an annual performance evaluation has been run. One of the main conclusions that will be drawn from this study, is that there is a very significant rater's effect that needs to be properly accounted for when rewarding employees. Without this adjustment being done, any incentive scheme, whether its motive is reward based or penalty based, will ultimately fail in its intended purpose of improving employees' overall performance]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[raters' effect]]></kwd>
<kwd lng="en"><![CDATA[performance appraisal]]></kwd>
<kwd lng="en"><![CDATA[model diagnostics]]></kwd>
<kwd lng="en"><![CDATA[mixed model]]></kwd>
<kwd lng="en"><![CDATA[fixed effect]]></kwd>
<kwd lng="en"><![CDATA[best linear unbiased predictor]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="right"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>ARTICLES</b></font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="4"><b>On employees'    performance appraisal: the impact and treatment of the raters' effect</b></font></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Temesgen Zewotir</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">School of Mathematics,    Statistics and Computer Science, University of KwaZulu-Natal</font></p>     <p>&nbsp;</p>     <p>&nbsp;</p> <hr size="1" noshade>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>ABSTRACT</b></font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">By putting in place    a performance appraisal scheme, employees who improve their work efficiency    can then be rewarded, whereas corrective action can be taken against those who    don't. The aim of this paper is to develop a technique that helps to measure    the subjective effect that a given rater's assessment will have on the performance    appraisal of a given employee, assuming that an assessment of one's work performance    will have to be undertaken by a rater and that this rating is essentially a    subjective one. In particular, a linear mixed modelling approach will be applied    to data that comes from a South African company which has 214 employees and    where an annual performance evaluation has been run. One of the main conclusions    that will be drawn from this study, is that there is a very significant rater's    effect that needs to be properly accounted for when rewarding employees. Without    this adjustment being done, any incentive scheme, whether its motive is reward    based or penalty based, will ultimately fail in its intended purpose of improving    employees' overall performance.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Key words:</b>    raters' effect; performance appraisal; model diagnostics; mixed model; fixed    effect; best linear unbiased predictor    <br>   <b>JEL: C210, 49, M49</b></font></p> <hr size="1" noshade>     <p>&nbsp;</p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>1 Introduction</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Yearly performance    reviews are seen as critically important for ensuring the success of public    entities and private companies (Saxena, 2010). Their aim is to induce workers    to become more efficient and effective (Kondrasuk, 2011), and help supervisors    to become more transparent in the way they interact with their workers. As a    result, workers begin to have a better understanding of their supervisors' expectations,    leading to a greater sense of ownership of their duties and thus improved work    performance. Ignoring these performance issues will ultimately decrease morale,    which in turn will lead to a drop-off in the company's overall level of performance    as management wastes time rectifying what isn't being done properly (Grote,    1996). Thus an effective performance appraisal can provide huge benefits for    the employer in terms of increased staff productivity, knowledge, loyalty and    participation (Margrave &amp; Gorden, 2001).</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">How one best measures    the performance of an employee, however, can be significantly affected by what    has become known as a horns and halos effect. This refers to the effect of one    person's judgment of another being unduly influenced by a first impression.    A selective perception problem, the term 'horns' refers to an unfavorable first    impression, while the term 'halo' refers to a favorable impression. Ideally    one would like to minimise the effect that a first impression has on a final    rating, but this selective perception bias has been observed in the behaviour    of all raters, and is therefore known as raters' effect (Wolfe, 2004).</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Due to the complexity    of the job performance and interpersonal relations at work, much of the existing    research typically indicates that raters account for significant proportions    of the variance in employees' true performance (Woehr et al, 2005; Hoffman &amp;    Woehr, 2009; Hoffman et al, 2010). It is therefore in the interests of both    the organisation and the individual to maximise the effectiveness of performance    appraisal by reducing the rater errors (see for example, Aguinis &amp; Pierce,    2008; Uggerslev &amp; Sulsky, 2008; Ferris, 2008; Ogunfowora, 2010). Most of    the studies focus on the rating strategies before the rating rather than attending    to rating outcomes.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Therefore, the    purpose of this study is to introduce a statistical method to (i) demonstrate    the plausibility of rater source factors at the performance appraisal; (ii)    to identify (and adjust for) the magnitude of raters' effect and thereby rank    the 'best' and 'worst' performers, and (iii) identify deviant ratings. Hence,    this study contributes to the literature by attempting to clarify the structure    of raters' effect, the existence and nature of raters' effect, and the relative    proportion of variance accounted for by the raters' influence on performance    ratings.</font></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>2 The data and    purpose of the analysis</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The South African    based company<a name="top1"></a><a href="#back1"><sup>1</sup></a> has 214 employees.    All were included in the study as each employee was part of a per annum based    performance appraisal scheme. For each project (or activity) in which he/she    was involved, that employee was given a rating on a continuum scale ranging    from 0 to 25, with a higher rating showing a better performance. The ratings    were performed by 85 evaluators. The scale of complexity of the given tasks    that the employees were being asked to perform was also taken into consideration    when the rating was being done by the evaluators.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">To help mitigate    the effect of using different raters, all 85 raters received some form of training    (i) to familiarise themselves with the measures that they would be working with,    (ii) to ensure that they understood the sequence of steps that they would have    to follow in their assessment and (iii) to explain how they should interpret    any normative data that they would be given. More details about the data can    be obtained from Zewotir (2001).</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">If one were able    to use all 85 raters to rate each and every employee in the firm, raters' training    would minimise rater effects, as the effects would be the same (Pulakos, 1986;    Houston et.al., 1991). No single employee would run the risk of having a lower    or higher overall rating as all the employees would receive the same benefit    or penalty from the rater's subjective leniency or harshness. In the firm that    we studied, however, not every employee was able to be rated by the same set    of raters. In particular, <a href="#t1">Table 1</a> shows how some raters evaluated    several employees whereas others only rated a few employees. It should be noted    that in <a href="#t1">Table 1</a> there are 340 ratings of 214 employees because    some employees were involved in a number of projects (or activities) and accordingly    had multiple raters.</font></p>     <p><a name="t1"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t01.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The difference    between the rating that will be assigned by a single rater and the average rating    that will be assigned by all 85 raters is called the 'raters' effect'. Clearly,    if this raters' effect is non zero, then employees that have been evaluated    by a different set of multiple raters may receive an unfair (i.e. biased) score    primarily because they have faced a relatively lenient or relatively harsh set    of judges when compared with the other employees in the firm. In this case,    an adjustment to a given employee's average score should be made, which takes    into account the potential bias that may arise because a different set of raters    has been used. Simply averaging the score given by each rater to an employee    will not adjust this raters' effect. In the next section we will develop a method    that attempts to account for a raters' effect. Once this has been done, we can    then separate 'good' performers from 'poor' performers and reward them accordingly.</font></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>3 Formulation    of the model</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A classical example    of testing for inter-rater reliability is described by Fliess (1986) in the    context of a medical situation where depressive patients are being rated by    several psychiatrists, and there is a restriction on the number of examinations    that a patient can undergo. However, this method cannot be used in our context    of performance appraisal because the rater who is evaluating a given employee    is someone who has a detailed knowledge of that person's performance, i.e. the    random assignment of employees to any given evaluator is not possible in our    context. Furthermore, one is not necessarily able to restrict the number of    employees that each rater sees, or vice versa.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Some researchers    have suggested that one calculate a mean performance score for each employee    and then rank the employees based on their mean performance. As has already    been noted, because the set of raters being used differs from one employee to    the next, simply ranking the mean performance scores of each employee will not    remove the rater bias in this procedure (Russell, 2000). Other researchers have    attempted to develop an analysis of variance-based raw scores (Braun, 1988;    de Gruijter, 1984; Houston et al., 1991) or a multifaceted Rasch model (Wolfe    et al., 2001; Wolfe 2004). Such a model however requires that one make use of    a Likert scale when rating an employee's performance (like Excellent, Very good,    Good, Fair, Poor).</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In our modelling    context the rating that is given is not based on a Likert scale. In order to    develop a performance score for a given employee and to correct this score for    a possible rater's effect, we will use a linear mixed model i.e.</font></p>     <p align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">y<sub>ij</sub>    = &#181; + </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>    + </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>    + </font><font size="2">&#949;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>ij</sub></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">where y<sub>ij</sub>    denotes the appraisal score of the i<sup>th</sup> employee that has been given    by rater j, &#181; denotes an overall mean score, </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>    denotes a deviation of employee i from this overall mean score, </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>    denotes the j<sup>th</sup> rater's effect and </font><font size="2">&#949;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>ij</sub>    is an error term. In particular, we will assume that the </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>s    are independent identically distributed normal random variables with a mean    0 and variance </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>1</sub><sup>2</sup>,    and the </font><font size="2">&#949;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>ij</sub>s    are independent identically distributed normal random error terms with mean    0 and variance </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>0</sub><sup>2</sup>,    respectively. Focusing on the model parameter </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>    some of the management group may want to look only at the 85 raters, in which    case the raters' effect </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>    should be treated as being a fixed effect. On the other hand, some may argue    that the 85 evaluators are representatives from a population of raters, in which    case the raters' effect should be treated as being a random effect.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Instead of arguing    about whether this raters' effect should be fixed or random, we will construct    two models: one with a raters' effect that is fixed and another where we treat    this raters' effect </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>    as being an independent identically distributed normal random variable with    a mean 0 and variance </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub><sup>2</sup>.    We will also assume that </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>,    </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>    and </font><font size="2">&#949;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>ij</sub>    are distributed independently of each other. The resulting model then becomes    a linear random effects model. A detailed discussion about linear random effect    models can be found in, among others, Harville (1990), Robinson (1991), Searle    et al. (2006) and SAS Institute (1992). The main focus of interest in this model    is the variance of the raters' effect, </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub><sup>2</sup>.    If </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub><sup>2</sup>    = 0, then the data supports the hypothesis that the raters' effect is constant    or identical. In other words, employees receive an identical bias from any rater    that is assigned by the company implying that there is no need to adjust the    employee's score with respect to a raters' effect. On the other hand, if the    hypothesis </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub><sup>2</sup>    = 0 is not supported by the data, then different raters have a different level    of leniency/severity that they employ when judging an employee's performance,    and thus the employee's score should be adjusted to account for this effect.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In a fixed effects    model our main interest will focus on whether the </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>j</sub>s    are identical for all j = 1, 2,...,85. Such a model is known as a two-way mixed    effect (see, for example, Little et al., 2000; Skrondal &amp; Rabe-Hesketh,    2004; McCulloch et al., 2008). If the data supports the following hypothesis    H<sub>0</sub>: </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>1</sub>=</font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub>=...=</font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>85</sub>    then the employees will be receiving an identical bias from all the 85 raters    so that there will be no need to adjust the employee's score for this rater's    effect.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">An important component    of this model is a measure of its reliability. Sometimes called an intra-class    correlation (ICC) coefficient, </font><font size="2">&#961;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">,    can be defined as the proportion of the total variance of the scores that can    be attributed to the true performance score.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The estimation    of the employee based variables ai will make use of a technique which is known    as Best Linear Unbiased Prediction (BLUP). BLUP is a class of statistical tools    that has some desirable properties (Robinson, 1991; SAS, 1992; Searle et al.,    1996; McCulloch et al., 2008). The term "Best" in the acronym BLUP is used to    describe the property that, from the available data on an employee, its predicted    true performance will be as error-free as possible. The term 'linear' simply    means the data has not been adjusted to some other scale such as being squared.    'Unbiasedness' means that, on average, the estimated true performance calculated    will be the same as the employee's true performance. 'Prediction' refers to    the task at hand: trying to predict true performance.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Once a BLUP has    been obtained for each one of the employee based parameters, a hypothesis test    can be constructed by noting that the standardised BLUP's are distributed as    a Student's t-distribution with degrees of freedom equal to the denominator    degrees of freedom (ddf). One can then pinpoint the i<sup>th</sup> employee    as being a significantly good/bad performer if the standardised BLUP is greater    than t(1-</font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">/2,    ddf) where t(1-</font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">/2;ddf)    is the lower 1-</font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">/2    level of Student's t distribution with degrees of freedom ddf. For exceptionally    good performers, the estimate will be positive valued and for bad performers    it will be negative valued.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Model diagnostics    also form an important part of statistical modelling. Zewotir and Galpin (2004,    2005 and 2007) have outlined some formal and informal procedures that can be    used to help detect outliers, influential points and specific departures from    underlying assumptions in the linear mixed models. These procedures will also    be employed in this paper.</font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>4 Results and    discussions</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>4.1 Without    an adjustment for the raters' effect</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">One can perform    an analysis without adjusting for the raters' effect, by simply using the average    score that has been assigned by all the raters to a given employee. Using this    approach, the best and worst performers are presented in <a href="#t2">Table    2</a>.</font></p>     <p><a name="t2"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t02.jpg"></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>4.2 Adjusted    model 1: Including a raters' effect as a fixed effect</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Results for the    rater fixed effects model are given in <a href="#t3">Table 3</a>. The rater    row of <a href="#t3">Table 3</a> is testing whether the rater effect parameter    estimates that we have obtained are significantly different from zero. The very    small p-value that we have obtained (p = 0.0001) indicates that the hypothesis    H<sub>0</sub>: </font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>1</sub>=</font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>2</sub>=...=</font><font size="2">&#946;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>85</sub>    = 0 can be rejected. This clearly shows the existence of a rater bias in the    scores given to different employees of the firm.</font></p>     <p><a name="t3"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t03.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The variance parameter    estimate for </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>1</sub><sup>2</sup>    that is given in <a href="#t3">Table 3</a> indicates that there is also variability    in the performance between employees that is statistically significant and therefore    needs to be accounted for. In fact 73% of the total variance associated with    the employees' score is attributable to the true performance score variability    of the employees, </font><font size="2">&#963;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>1</sub><sup>2</sup>.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="#t4">Table    4</a> provides a ranking of employees based on the BLUPs that have been obtained    for </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><sub>i</sub>.    The results need to be interpreted as a continuum where large negative values    indicate a poor performance and large positive values indicate an excellent    performance. An estimate for each employee's true performance score can then    be obtained by adding the appropriate BLUP score that has been given in <a href="#t4">Table    4</a> for a given employee to the overall mean estimate of 19.39 that has been    given in <a href="#t3">Table 3</a>.</font></p>     <p><a name="t4"></a></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t04.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Unlike the results    in <a href="#t2">Table 2</a>, the results in <a href="#t4">Table 4</a> account    for the bias from raters and adjust the employees' score for this rater's effect.    Besides the adjustment for the raters' bias, <a href="#t4">Table 4</a> accounts    for the variability of the employee score. For instance, employee 100 was not    listed as one of the poor performers in <a href="#t2">Table 2</a>, but is listed    as the second poorest performer in <a href="#t4">Table 4</a>. When we scrutinise    the evaluation report of employee 100 we see that employee 100 was rated by    two raters (raters 32 and 58, with a score of 15.9 and 12.5 respectively). However,    these two raters rated other employees; for example rater 32 rated eight employees    and gave them scores of 18.2, 21.8, 16.6, 22.3, 20.6, 17, 19.6 and 15.9 respectively    and rater 58 rated two employees with scores of 20 and 12.5 respectively. From    the two raters we note that the score of employee 100 is the lowest. Moreover,    by tracing back to determine how raters 32 and 58 rated other employees relative    to the other raters, we note that, on average, raters 32 and 58 tended to be    more lenient. With all these considerations in the model, the predicted performance    score for employee 100 then becomes a significantly negative score, as given    in <a href="#t4">Table 4</a>. But the crude average score of employee 100, 14.2,    would not place this employee among the worst performers. Likewise by adjusting    for raters' effect employee 15 becomes one of the top performers, as shown in    <a href="#t4">Table 4</a>, whereas this employee was not listed as a top performer    in <a href="#t2">Table 2</a>.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="#f1">Figure    1</a> contains a set of plots that can be used to assess the normality assumption    and the goodness of fit of the data. The plots indicate no recognisable outlier    in the data. The application of a more formal test (as outlined in Zewotir &amp;    Galpin, 2007) also did not record the maximum absolute Studentised residual    as being an outlier. The normal probability plot is linear, which indicates    that the assumption of normality is reasonable. The linearity of the plot is    also supported by the W-statistic which is an adaption of Shapiro and Wilk's    (1965) normality test to a linear mixed model (Zewotir &amp; Galpin 2004). In    particular, the following result was recorded (W =0.9777 for which p=0.0665)    which favours the normal distribution.</font></p>     <p><a name="f1"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04f01.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Focusing on those    observations, that could be potential outliers for our study, it was found that    observation numbers 123 and 246 were the most influential observations. When    these observations were removed, however, no significant change in the parameter    estimates or goodness of fit of the resulting model was recorded. Nevertheless,    because we are dealing with people who we may want to incentivise it could be    argued that one would like to examine these two outliers more carefully.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation number    123 contains a score of 15 for employee 72 that has been given by rater 27.    This same employee was also rated by three other people (namely, raters 21,    37 and 58) who gave that employee the following respective scores (22, 18, 19.6).    It should be noted that rater 27 also had to rate nine other employees and the    score of employee 72 was the lowest given by rater 27. Raters 21, 37 and 58,    however, put employee 72 as their 3<sup>rd</sup>, 3<sup>rd</sup> and 2<sup>nd</sup>    highest performing employee, respectively.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Case number 246    deals with employee 155 who was rated by a single person (rater 35) and was    given a score of 20. It should be mentioned that rater 35 also had to rate seven    other employees (employees 33, 73, 87, 155, 162, 194, and 202) giving them the    following respective scores (13, 15, 9, 20, 12, 18, and 15). In terms of the    ratings that these seven employees received from other people, the score of    rater 35 was found to be the lowest for five of these employees and the second    lowest for another one of these employees. Because of this obvious downward    bias in the rating record of rater 35, when an adjustment is being made to employee    155's score, the predicted performance score for employee 155 then becomes very    large as reflected in <a href="#t4">Table 4</a>.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>4.3 Including    a raters' effect as a random effect</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Maximum likelihood    estimates for the model parameters and the associated tests of significance    are presented in <a href="#t5">Table 5</a>. The results indicate that the rater    and employee effects are significant.</font></p>     <p><a name="t5"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t05.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Employing our formal    outlier testing procedure does not label any observation as being an outlier.    A graph of the residuals is given in <a href="#f2">Figure 2</a>. None of the    observations appear to be separated from the bulk of other observations. The    normal probability plot does not indicate a serious violation of the normality    assumption. The summary statistic (W = 0.978), also favours a normality assumption    (p = 0.0860).</font></p>     <p><a name="f2"></a></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04f02.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A prediction of    the true performance of each employee shows that ten employees (see <a href="#t6">Table    6</a>) can be regarded as performing exceptionally badly or well. For exceptionally    good performers, note that the estimate will be positive-valued and for bad    performers the estimate will be negative-valued. Furthermore, the prediction    of an employee's true performance is obtained by adding the estimate given in    <a href="#t6">Table 6</a> to the overall mean that we obtained for the model.</font></p>     <p><a name="t6"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t06.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">All the worst and    best performers given in <a href="#t6">Table 6</a> were also identified as the    worst and best performers in <a href="#t4">Table 4</a>. The consistency of the    employee's performance and the overall variability in the harshness and leniency    shown by the 85 raters, were the only role players in <a href="#t6">Table 6</a>    results. But the role players for <a href="#t4">Table 4</a> results were the    average leniency or harshness of the raters who rated the employees and the    employees' performance. Since employees who were rated by fewer raters have    a less consistent performance predictor, the majority of the worst or best performers    who were rated by only one rater were the least favoured to be listed from <a href="#t4">Table    4</a> into <a href="#t6">Table 6</a>. For instance, consider employees 37 and    86 from the top performer employees given in <a href="#t4">Table 4</a>. Employee    37 was rated by three raters with a score of 21, 22 and 22. On the other hand,    employee 86 was rated by a single rater with a score of 22.3. Employee 35 is    a consistent performer and leads the top performers in <a href="#t6">Table 6</a>,    but not so employee 86.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Since the raters'    effects were considered as random effects, we obtain the BLUP estimate of the    realised raters' effect. An investigation of these estimates of the BLUPs of    raters' effect showed the harshness or leniency displayed by raters in their    judgments. <a href="#t7">Table 7</a> provides the extreme rankings of raters    based on the BLUP's estimate of the raters' effect latent values: large negative    values indicate a harsh rater and large positive values indicate a lenient rater.    Rater 23, who evaluated thirteen employees and gave them the following scores    11, 19, 18, 12, 15, 16, 12, 15, 10, 14, 16, 14 and 13, can be viewed as being    the most harsh rater. Similarly, rater 31, who evaluated 6 employees and gave    them the following scores 21, 21, 24, 22, 22 and 21, can be viewed as being    the most lenient rater.</font></p>     ]]></body>
<body><![CDATA[<p><a name="t7"></a></p>     <p>&nbsp;</p>     <p align="center"><img src="/img/revistas/sajems/v15n1/04t07.jpg"></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">With regard to    the existence of some possibly influential observations, observations number    69 and 297 were flagged in the analysis. Omitting both cases from the analysis    did not substantially change the estimates that we obtained for the variance    parameters or the overall goodness of fit of the model. It is interesting to    note, however, that case 69 represents a score of 24 that was given to employee    39 by rater 43. This score was in fact the largest score that was given by any    one rater to any one employee. The next highest score received by an employee    was 16, which resulted in rater 43 being flagged an outlying rater in <a href="#t7">Table    7</a>.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Case 297 refers    to a score of 23 for employee 188, given by rater 40. This score is the second    highest score that was given by a rater in the entire employees' evaluation    process. Furthermore, this was the only score that employee 188 received.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In <a href="#t2">Table    2</a>, results were based on the crude average scores without any consideration    of adjustment for the raters' effect. In <a href="#t4">Table 4</a> the employee    performance predictor takes the average leniency/harshness of the associated    rater into consideration. In <a href="#t5">Table 5</a> the consistency of the    employee in the ratings, is taken into account. What is evident from <a href="#t2">Tables    2</a>, <a href="#t4">4</a> and <a href="#t6">6</a> is that the interest is in    the true performance of the employee not in an average score based on a few    measures/rates about the employee's performance. The basic problem is that the    observed value on the employee is not equal to the employee's true performance.    How should we then estimate an employee's true performance latent value? The    mixed model random effect links the rating to the true performance latent value.    The estimate of the employee-true performance latent value is typically the    BLUP estimate. As the number of measures on an employee gets larger, the BLUP    estimate becomes consistent and approaches the employee's true performance latent    value. The results in <a href="#t4">Tables 4</a> and <a href="#t6">6</a> are    sufficiently convincing to use the BLUP estimates in employees' appraisal routine    practice by considering raters' effect as fixed or random.</font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>5 Conclusions    and implications</b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Performance appraisal    systems are essential for a company to run efficiently and productively. With    performance appraisal in place, employees can be given a sense of ownership    and responsibility with regard to the duties that they perform. The challenge    is to know how best to adjust a given measure of an employee's performance so    that it is not unduly influenced by a rater's tendency to make private and highly    subjective assessments. Using a simple average of scores from a set of raters    will not adjust for any hidden subjectivity that may reside in that specific    group of raters. Because different employees are being assessed by different    raters, a subjective bias may be introduced into the rating of one employee    when compared with that of another employee. This paper has sought to address    this problem.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The linear mixed    model that has been applied in this study allows for some flexibility with regard    to whether one wants to view a rater's effect as being a fixed or random effect.    A rater effect can be treated as being fixed if the raters are being selected    by the company with the purpose of comparing one rater with another. On the    other hand, the raters' effect can be treated as being random if we want to    make statements about the variation in the overall population from which our    raters are being drawn.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Because we are    interested in effects that, we believe, are common to all individuals and also    effects that are different among individuals, a mixed effects model can be used    to capture both these features. The mixed model provides estimates (BLUPs) of    each employee's true performance which can then be subjected to a formal test    to identify those employees who, statistically, are significantly good or bad    performers in the company.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The model's diagnostics    tools that we have used help to provide some reassurance that the model is not    being contradicted by the data that we are observing or is being unduly influenced    by particular characteristics of the data. The results of this paper have consistently    shown that, unless the same raters are evaluating all employees, there are considerable    rater based effects which cannot simply be ignored in any employees' performance    appraisal.</font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b><i>Acknowledgement</i></b></font></p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The author is grateful    to the anonymous reviewers and the managing editor for several important comments    and suggestions. The author is also grateful to Prof Michael Murray and Dr Edilegnaw    Wale for their careful reading of the first draft.</font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>References</b></font></p>     <!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">AGUINIS, H. &amp;    PIERCE, C.A. 2008. Enhancing the relevance of organizational behavior by embracing    performance management research. <i>Journal of Organizational Behavior,</i>    29:139-145 (2008).</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619870&pid=S2222-3436201200010000400001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">BRAUN, H.I. 1988.    Understanding score reliability: experiments in calibrating essay readers. <i>Journal    of Educational Statistics,</i> 13:1 -18.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619871&pid=S2222-3436201200010000400002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">DE GRUIJTER, D.N    1984. Two simple models for rater effects. <i>Applied Psychological Measurement,</i>    8: 213-218.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619872&pid=S2222-3436201200010000400003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">FERRIS, G.R., MUNYON,    T.P., BASIK, K., &amp; BUCKLEY, M.R. 2008. The performance evaluation context:    Social, emotional, cognitive, political, and relationship components. <i>Human    Resource Management Review,</i> 18:146-163.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619873&pid=S2222-3436201200010000400004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">FLEISS, J.L. 1986.    <i>Design and analysis of clinical experiments.</i> New York: John Wiley &amp;    Sons.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619874&pid=S2222-3436201200010000400005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">GROTE, R.C. 1996.    <i>The complete guide to performance appraisal.</i> New York: AMACOM, AMA's    book publishing division.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619875&pid=S2222-3436201200010000400006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">HARVILLE, D.A.    1990. <i>BLUP (Best Linear Unbiased Prediction) and beyond. In advances in statistical    methods for genetic improvement of livestock,</i> 239-276. New York: Springer-Verlag.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619876&pid=S2222-3436201200010000400007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">HOFFMAN, B. &amp;    WOEHR, D.J. 2009. Disentangling the meaning of multisource feedback source and    dimension factors. <i>Personnel Psychology,</i> 62.735-765.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619877&pid=S2222-3436201200010000400008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Hoffman, B.J.,    Lance, C, Bynum, B., &amp; Gentry, B (2010). Rater source effects are alive    and well after all. <i>Personnel Psychology,</i> 63:119-151.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619878&pid=S2222-3436201200010000400009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">HOUSTON, W.M.,    RAYMOND, M.R. &amp; SVEC,J.C. 1991. Adjustments for rater effects. <i>Applied    Psychological Measurement,</i> 15(4):409-421.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619879&pid=S2222-3436201200010000400010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">KONDRASUK, J.N.    2011. So what would an ideal performance appraisal look like? <i>Journal of    Applied Business and Economics,</i> 12(1):57-71.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619880&pid=S2222-3436201200010000400011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">LITTLE, T.D., SCHNABEL,    K.U. &amp; BAUMERT, J. 2000. <i>Modeling longitudinal and multilevel data.</i>    London: Lawrence Erlbaum Associates Publishers.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619881&pid=S2222-3436201200010000400012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">MARGRAVE, A. &amp;    GORDEN, R. 2001. <i>The complete idiot's guide to performance appraisals.</i>    New York: Alpha Books/Macmillan.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619882&pid=S2222-3436201200010000400013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">McCULLOCH, CE.    SEARLE, S.R., &amp; CASELLA, G. 1996. Variance components. New York: John Wiley.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619883&pid=S2222-3436201200010000400014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">McCULLOCH, CE.,    SEARLE, S.R. &amp;NEUHAUS, J.M. 2008. <i>Generalized, linear, and mixed models</i>    (2<sup>nd</sup> ed.) New York: John Wiley.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619884&pid=S2222-3436201200010000400015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">OGUNFOWORA, B.,    BOURDAGE, J. &amp; LEE, K. 2010. Rater personality and performance dimension    weighting in making overall performance judgments. <i>Journal of Business and    Psychology,</i> 25:465-476.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619885&pid=S2222-3436201200010000400016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">PULAKOS, E.D. 1986.    The development of training programs to increase accuracy on different rating    forms. <i>Organizational Behavior and Human Decision Processes,</i> 38:76-91.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619886&pid=S2222-3436201200010000400017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">ROBINSON, G.K.    1991. That BLUP is a good thing: the estimation of random effects. <i>Statistical    Science,</i> 6: 15-51.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619887&pid=S2222-3436201200010000400018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">RUSSELL, M. 2000.    Summarizing change in test scores: shortcomings of three common methods. <i>Practical    Assessment, Research &amp; Evaluation,</i> 7(5). Available at: <a href="http://pareonline.net/getvn.asp?v=7&amp;n=5" target="_blank">http://pareonline.net/getvn.asp?v=7&amp;n=5</a>    &#91;Accessed 2009-09-16&#93;.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619888&pid=S2222-3436201200010000400019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">SAS INSTITUTE 1992.    SAS Technical Report P-229. Carey (North Carolina): SAS Institute Inc.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619889&pid=S2222-3436201200010000400020&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">SAXENA, S. 2010.    Performance management system. <i>Global Journal of Management and Business    Research,</i> 10(5):27-30.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619890&pid=S2222-3436201200010000400021&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">SHAPIRO, S.S. AND    WILK, M.B. 1965. An analysis of variance tests for normality (Complete Samples).    <i>Biometrika,</i> 52:591-611.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619891&pid=S2222-3436201200010000400022&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">SKRONDAL, A. &amp;    RABE-HESKETH, S. 2004. <i>Generalized latent variable modelling: multilevel,    longitudinal and structural equation models.</i> London: Chapman and Halls.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619892&pid=S2222-3436201200010000400023&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">UGGERSLEV, K.L.,    &amp; SULSKY, L.M. 2008. Using frame-of-reference training to understand the    implications of rater idiosyncrasy for rating accuracy. <i>Journal of Applied    Psychology,</i> 93:711-719.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619893&pid=S2222-3436201200010000400024&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">WOEHR D.J., SHEEHAN,    M.K., BENNETT, W. 2005. Assessing measurement equivalence across ratings sources:    a multitrait-multirater approach. <i>Journal of Applied Psychology,</i> 90:592-600.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619894&pid=S2222-3436201200010000400025&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">WOLFE, E.W. 2004.    Identifying rater effects using latent trait models. <i>Psychology Science,</i>    46:35-51.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619895&pid=S2222-3436201200010000400026&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">WOLFE, E.W., MOULDER,    B.C., &amp; MYFORD, C.M. 2001. Detecting differential rater functioning over    time (DRIFT) using a Rasch multi-faceted rating scale model. <i>Journal of Applied    Measurement,</i> 2:256-280.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619896&pid=S2222-3436201200010000400027&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">ZEWOTIR, T. 2001.    Influence diagnostics in mixed models. PhD thesis: University of Witwatersrand.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619897&pid=S2222-3436201200010000400028&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">ZEWOTIR T. &amp;    GALPIN, J.S. 2004. The behaviour of normality under non-normality for mixed    models. <i>South African Statistical Journal,</i> 38:115-138.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=619898&pid=S2222-3436201200010000400029&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p>&nbsp;</p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Accepted: September    2011</font></p>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a name="back1"></a><a href="#top1">1</a>    The name of the company could not be disclosed for anonymity reasons.</font></p>      ]]></body>
<REFERENCES></REFERENCES<back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[AGUINIS]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[PIERCE]]></surname>
<given-names><![CDATA[C.A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Enhancing the relevance of organizational behavior by embracing performance management research]]></article-title>
<source><![CDATA[Journal of Organizational Behavior]]></source>
<year>2008</year>
<month>20</month>
<day>08</day>
<volume>29</volume>
<page-range>139-145</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[BRAUN]]></surname>
<given-names><![CDATA[H.I]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Understanding score reliability: experiments in calibrating essay readers]]></article-title>
<source><![CDATA[Journal of Educational Statistics]]></source>
<year>1988</year>
<volume>13</volume>
<page-range>1 -18</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[DE GRUIJTER]]></surname>
<given-names><![CDATA[D.N]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Two simple models for rater effects]]></article-title>
<source><![CDATA[Applied Psychological Measurement]]></source>
<year>1984</year>
<volume>8</volume>
<page-range>213-218</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[FERRIS]]></surname>
<given-names><![CDATA[G.R]]></given-names>
</name>
<name>
<surname><![CDATA[MUNYON]]></surname>
<given-names><![CDATA[T.P]]></given-names>
</name>
<name>
<surname><![CDATA[BASIK]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[BUCKLEY]]></surname>
<given-names><![CDATA[M.R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The performance evaluation context: Social, emotional, cognitive, political, and relationship components]]></article-title>
<source><![CDATA[Human Resource Management Review]]></source>
<year>2008</year>
<volume>18</volume>
<page-range>146-163</page-range></nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[FLEISS]]></surname>
<given-names><![CDATA[J.L]]></given-names>
</name>
</person-group>
<source><![CDATA[Design and analysis of clinical experiments]]></source>
<year>1986</year>
<publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[John Wiley & Sons]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[GROTE]]></surname>
<given-names><![CDATA[R.C]]></given-names>
</name>
</person-group>
<source><![CDATA[The complete guide to performance appraisal]]></source>
<year>1996</year>
<publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[AMACOM, AMA's book publishing division]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[HARVILLE]]></surname>
<given-names><![CDATA[D.A]]></given-names>
</name>
</person-group>
<source><![CDATA[BLUP (Best Linear Unbiased Prediction) and beyond: In advances in statistical methods for genetic improvement of livestock]]></source>
<year>1990</year>
<page-range>239-276</page-range><publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[Springer-Verlag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[HOFFMAN]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[WOEHR]]></surname>
<given-names><![CDATA[D.J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Disentangling the meaning of multisource feedback source and dimension factors]]></article-title>
<source><![CDATA[Personnel Psychology]]></source>
<year>2009</year>
<volume>62</volume>
<page-range>735-765</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hoffman]]></surname>
<given-names><![CDATA[B.J]]></given-names>
</name>
<name>
<surname><![CDATA[Lance]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Bynum]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Gentry]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Rater source effects are alive and well after all]]></article-title>
<source><![CDATA[Personnel Psychology]]></source>
<year>2010</year>
<volume>63</volume>
<page-range>119-151</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[HOUSTON]]></surname>
<given-names><![CDATA[W.M]]></given-names>
</name>
<name>
<surname><![CDATA[RAYMOND]]></surname>
<given-names><![CDATA[M.R]]></given-names>
</name>
<name>
<surname><![CDATA[SVEC]]></surname>
<given-names><![CDATA[J.C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Adjustments for rater effects]]></article-title>
<source><![CDATA[Applied Psychological Measurement]]></source>
<year>1991</year>
<volume>15</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>409-421</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KONDRASUK]]></surname>
<given-names><![CDATA[J.N]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[So what would an ideal performance appraisal look like?]]></article-title>
<source><![CDATA[Journal of Applied Business and Economics]]></source>
<year>2011</year>
<volume>12</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>57-71</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LITTLE]]></surname>
<given-names><![CDATA[T.D]]></given-names>
</name>
<name>
<surname><![CDATA[SCHNABEL]]></surname>
<given-names><![CDATA[K.U]]></given-names>
</name>
<name>
<surname><![CDATA[BAUMERT]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Modeling longitudinal and multilevel data]]></source>
<year>2000</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Lawrence Erlbaum Associates Publishers]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[MARGRAVE]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[GORDEN]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<source><![CDATA[The complete idiot's guide to performance appraisals]]></source>
<year>2001</year>
<publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[Alpha Books/Macmillan]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[McCULLOCH]]></surname>
<given-names><![CDATA[CE]]></given-names>
</name>
<name>
<surname><![CDATA[SEARLE]]></surname>
<given-names><![CDATA[S.R]]></given-names>
</name>
<name>
<surname><![CDATA[CASELLA]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<source><![CDATA[Variance components]]></source>
<year>1996</year>
<publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[John Wiley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[McCULLOCH]]></surname>
<given-names><![CDATA[CE]]></given-names>
</name>
<name>
<surname><![CDATA[SEARLE]]></surname>
<given-names><![CDATA[S.R]]></given-names>
</name>
<name>
<surname><![CDATA[NEUHAUS]]></surname>
<given-names><![CDATA[J.M]]></given-names>
</name>
</person-group>
<source><![CDATA[Generalized, linear, and mixed models]]></source>
<year>2008</year>
<edition>2nd</edition>
<publisher-name><![CDATA[New YorkJohn Wiley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[OGUNFOWORA]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[BOURDAGE]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[LEE]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Rater personality and performance dimension weighting in making overall performance judgments]]></article-title>
<source><![CDATA[Journal of Business and Psychology]]></source>
<year>2010</year>
<volume>25</volume>
<page-range>465-476</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[PULAKOS]]></surname>
<given-names><![CDATA[E.D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The development of training programs to increase accuracy on different rating forms]]></article-title>
<source><![CDATA[Organizational Behavior and Human Decision Processes]]></source>
<year>1986</year>
<volume>38</volume>
<page-range>76-91</page-range></nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[ROBINSON]]></surname>
<given-names><![CDATA[G.K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[That BLUP is a good thing: the estimation of random effects]]></article-title>
<source><![CDATA[Statistical Science]]></source>
<year>1991</year>
<volume>6</volume>
<page-range>15-51</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[RUSSELL]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Summarizing change in test scores: shortcomings of three common methods]]></article-title>
<source><![CDATA[Practical Assessment, Research & Evaluation]]></source>
<year>2000</year>
<volume>7</volume>
<numero>5</numero>
<issue>5</issue>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<collab>SAS INSTITUTE</collab>
<source><![CDATA[SAS Technical Report P-229]]></source>
<year>1992</year>
<publisher-loc><![CDATA[^eNorth Carolina North Carolina]]></publisher-loc>
<publisher-name><![CDATA[SAS Institute Inc]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[SAXENA]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Performance management system]]></article-title>
<source><![CDATA[Global Journal of Management and Business Research]]></source>
<year>2010</year>
<volume>10</volume>
<numero>5</numero>
<issue>5</issue>
<page-range>27-30</page-range></nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[SHAPIRO]]></surname>
<given-names><![CDATA[S.S]]></given-names>
</name>
<name>
<surname><![CDATA[WILK]]></surname>
<given-names><![CDATA[M.B]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An analysis of variance tests for normality (Complete Samples)]]></article-title>
<source><![CDATA[Biometrika]]></source>
<year>1965</year>
<volume>52</volume>
<page-range>591-611</page-range></nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[SKRONDAL]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[RABE-HESKETH]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[Generalized latent variable modelling: multilevel, longitudinal and structural equation models]]></source>
<year>2004</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Chapman and Halls]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[UGGERSLEV]]></surname>
<given-names><![CDATA[K.L]]></given-names>
</name>
<name>
<surname><![CDATA[SULSKY]]></surname>
<given-names><![CDATA[L.M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Using frame-of-reference training to understand the implications of rater idiosyncrasy for rating accuracy]]></article-title>
<source><![CDATA[Journal of Applied Psychology]]></source>
<year>2008</year>
<volume>93</volume>
<page-range>711-719</page-range></nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[WOEHR]]></surname>
<given-names><![CDATA[D.J]]></given-names>
</name>
<name>
<surname><![CDATA[SHEEHAN]]></surname>
<given-names><![CDATA[M.K]]></given-names>
</name>
<name>
<surname><![CDATA[BENNETT]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Assessing measurement equivalence across ratings sources: a multitrait-multirater approach]]></article-title>
<source><![CDATA[Journal of Applied Psychology]]></source>
<year>2005</year>
<volume>90</volume>
<page-range>592-600</page-range></nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[WOLFE]]></surname>
<given-names><![CDATA[E.W]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Identifying rater effects using latent trait models]]></article-title>
<source><![CDATA[Psychology Science]]></source>
<year>2004</year>
<volume>46</volume>
<page-range>35-51</page-range></nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[WOLFE]]></surname>
<given-names><![CDATA[E.W]]></given-names>
</name>
<name>
<surname><![CDATA[MOULDER]]></surname>
<given-names><![CDATA[B.C]]></given-names>
</name>
<name>
<surname><![CDATA[MYFORD]]></surname>
<given-names><![CDATA[C.M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model]]></article-title>
<source><![CDATA[Journal of Applied Measurement]]></source>
<year>2001</year>
<volume>2</volume>
<page-range>256-280</page-range></nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[ZEWOTIR]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<source><![CDATA[Influence diagnostics in mixed models]]></source>
<year>2001</year>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[ZEWOTIR]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[GALPIN]]></surname>
<given-names><![CDATA[J.S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The behaviour of normality under non-normality for mixed models]]></article-title>
<source><![CDATA[South African Statistical Journal]]></source>
<year>2004</year>
<volume>38</volume>
<page-range>115-138</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
