I really should have gotten back to this sooner but for those who are wondering how things went with SWoRD, the peer review writing site I used with my writing class in the spring, my overall reaction is that while it might be useful for some people, I probably won't use it the next time around. For those who missed my earlier posts, I discussed the basics of SWoRD, whether SWoRD can replace instructor grading, and some first reactions to SWoRD's reviewing process (after the first assignment) back in March. I made some tweaks as the semester progressed but overall, I have to say the experience was still pretty rough.
To briefly recap, SWoRD is an online peer review system where 1) students upload their papers, 2) the system randomly assigns other students to anonymously review those papers, 3) peer reviewers give both open-ended comments and numeric ratings in response to instructor-generated prompts, 4) authors 'back evaluate' their reviews, which means they give a numeric rating of how helpful the open-ended comments were, and 5) the system uses the numeric ratings from the reviewers to generate a writing score for the authors and uses the back evaluation ratings from the authors to generate a reviewing score for the reviewers. That last step, having the writing and reviewing scores generated entirely from the students themselves, is the main benefit of SWoRD, relative to other online peer review options like Calibrated Peer Review or Turnitin's PeerMark. My opinion is that the system has some problems that make those grades somewhat suspect. Unfortunately, I'm not sure there really is any satisfactory way to automate that process.
"Bad" reviewers may not be penalized
For starters, my original understanding of how the SWoRD grading system works was incorrect. I relied on some research papers that are posted on the SWoRD site (papers published a few years ago) and the system has since been changed but that is not explained anywhere on the site. The earlier papers said that the writing grades were weighted in such a way that if the score from one reviewer was substantially different from the scores from other reviewers, that score would be given less weight. However, that is not actually the case, which I discovered when one of my better students kept bugging me about his grade on one particular assignment. When I looked at the scores, there was one reviewer who gave 1's and 2's (out of 7) to all the papers he reviewed. Since that reviewer also did not provide very helpful comments, my guess is that he was either confused about the scoring or just lazy and not taking it seriously. Based on my original understanding, I thought the fact that his scores were so much lower than the other reviewers should have lowered that student's 'accuracy' reviewing grade and his scores should have been given a lot less weight for the students he reviewed. Neither of those things happened (his reviewing grade was actually somewhat higher than the class average and his scores definitely reduced the writing score for those papers). When I asked the SWoRD team about this, the response was that the "accuracy" part of the reviewing grade is based on rank orderings, not a comparison to the other ratings; that is, as long as the reviewer is giving higher ratings to 'better' papers and lower ratings to 'worse' papers, the system considers the ratings to be 'accurate'. The message from the SWoRD team said that they had "decided it wasn't valid to penalize someone for using a different range of the scale because often they were actually the most valid rater, with other students rating too high overall. If the instructor decides [a student] was unreasonably harsh, the thing to do is give [that student] a lower reviewing grade." On the one hand, I understand why they made that change, since I definitely noticed that my better students tended to give somewhat lower scores, on average (along with better comments justifying their scores), than their classmates. On the other hand, if I have to go through and scrutinize all the scores to see if students are scoring appropriately, that seems to defeat the whole purpose in having the scoring algorithm in the first place.
Incomplete information for back evaluations
Based on my reading of the research papers, in the earlier versions of the system, students could not submit back evaluations until after they turned in their second draft but they did see both the comments and the numeric scores from the reviewers (requiring them to turn in the second draft before doing the back evaluations was a way to make sure students actually had to process the comments before evaluating them). In the current version, students do not get to see the numeric reviewing scores until after they have submitted their back evaluations. Again, I can understand why this change was made; I can certainly imagine that some students would 'retaliate' for low reviewing scores by giving low back evaluation scores. But on the other hand, I saw many instances where reviewers gave scores that were not consistent with, or explained by, their open-ended comments (for example, a vague comment that 'everything looks fine' followed by a score of 3 or 4 out of 7). In my opinion, those reviewers should be given lower reviewing scores but the only way to accomplish this would be if the instructor goes in and manually reviews all the scores and comments, again defeating the purpose of having the scoring automated.
Reviewing itself is useful (but I'm still learning)
Given the problems with the scoring, I was expecting more negative comments from the students at the end of the semester but evaluations of the system were actually relatively positive, though less than half thought I should continue to use it in the future. Many of the critical comments were about the reviewing process itself (e.g., wanting more guidance for how to do good reviews, feeling like classmates didn't take it seriously enough or didn't give useful feedback, saying they should only review three papers instead of four or five, etc.), rather than the SWoRD system. The SWoRD-specific comments had to do with things like the deadlines being 9pm which was hard for students to remember (this isn't something the instructor can change), or the files being converted to PDFs so it was hard to refer to specific points in the papers (versus hard copies or Word docs that could be marked up). But students did seem to see the value in the reviewing, with several students commenting that doing the reviews helped them see where their own papers needed improvement.
So to sum up, I do think that the SWoRD system can still be useful for some instructors; if nothing else, it provides an infrastructure for students to submit papers, have reviewers randomly and anonymously assigned, and give/get feedback from multiple readers. You don't have to use the scores that the system generates. I particularly think SWoRD could be good for shorter assignments, where the evaluation criteria are relatively objective (and thus reviews might be more consistent). But if you aren't going to use the grades generated by the system, I think there may be other, better tools that could be used to facilitate peer reviewing; I'll talk about some of those options in my next post...
To briefly recap, SWoRD is an online peer review system where 1) students upload their papers, 2) the system randomly assigns other students to anonymously review those papers, 3) peer reviewers give both open-ended comments and numeric ratings in response to instructor-generated prompts, 4) authors 'back evaluate' their reviews, which means they give a numeric rating of how helpful the open-ended comments were, and 5) the system uses the numeric ratings from the reviewers to generate a writing score for the authors and uses the back evaluation ratings from the authors to generate a reviewing score for the reviewers. That last step, having the writing and reviewing scores generated entirely from the students themselves, is the main benefit of SWoRD, relative to other online peer review options like Calibrated Peer Review or Turnitin's PeerMark. My opinion is that the system has some problems that make those grades somewhat suspect. Unfortunately, I'm not sure there really is any satisfactory way to automate that process.
"Bad" reviewers may not be penalized
For starters, my original understanding of how the SWoRD grading system works was incorrect. I relied on some research papers that are posted on the SWoRD site (papers published a few years ago) and the system has since been changed but that is not explained anywhere on the site. The earlier papers said that the writing grades were weighted in such a way that if the score from one reviewer was substantially different from the scores from other reviewers, that score would be given less weight. However, that is not actually the case, which I discovered when one of my better students kept bugging me about his grade on one particular assignment. When I looked at the scores, there was one reviewer who gave 1's and 2's (out of 7) to all the papers he reviewed. Since that reviewer also did not provide very helpful comments, my guess is that he was either confused about the scoring or just lazy and not taking it seriously. Based on my original understanding, I thought the fact that his scores were so much lower than the other reviewers should have lowered that student's 'accuracy' reviewing grade and his scores should have been given a lot less weight for the students he reviewed. Neither of those things happened (his reviewing grade was actually somewhat higher than the class average and his scores definitely reduced the writing score for those papers). When I asked the SWoRD team about this, the response was that the "accuracy" part of the reviewing grade is based on rank orderings, not a comparison to the other ratings; that is, as long as the reviewer is giving higher ratings to 'better' papers and lower ratings to 'worse' papers, the system considers the ratings to be 'accurate'. The message from the SWoRD team said that they had "decided it wasn't valid to penalize someone for using a different range of the scale because often they were actually the most valid rater, with other students rating too high overall. If the instructor decides [a student] was unreasonably harsh, the thing to do is give [that student] a lower reviewing grade." On the one hand, I understand why they made that change, since I definitely noticed that my better students tended to give somewhat lower scores, on average (along with better comments justifying their scores), than their classmates. On the other hand, if I have to go through and scrutinize all the scores to see if students are scoring appropriately, that seems to defeat the whole purpose in having the scoring algorithm in the first place.
Incomplete information for back evaluations
Based on my reading of the research papers, in the earlier versions of the system, students could not submit back evaluations until after they turned in their second draft but they did see both the comments and the numeric scores from the reviewers (requiring them to turn in the second draft before doing the back evaluations was a way to make sure students actually had to process the comments before evaluating them). In the current version, students do not get to see the numeric reviewing scores until after they have submitted their back evaluations. Again, I can understand why this change was made; I can certainly imagine that some students would 'retaliate' for low reviewing scores by giving low back evaluation scores. But on the other hand, I saw many instances where reviewers gave scores that were not consistent with, or explained by, their open-ended comments (for example, a vague comment that 'everything looks fine' followed by a score of 3 or 4 out of 7). In my opinion, those reviewers should be given lower reviewing scores but the only way to accomplish this would be if the instructor goes in and manually reviews all the scores and comments, again defeating the purpose of having the scoring automated.
Reviewing itself is useful (but I'm still learning)
Given the problems with the scoring, I was expecting more negative comments from the students at the end of the semester but evaluations of the system were actually relatively positive, though less than half thought I should continue to use it in the future. Many of the critical comments were about the reviewing process itself (e.g., wanting more guidance for how to do good reviews, feeling like classmates didn't take it seriously enough or didn't give useful feedback, saying they should only review three papers instead of four or five, etc.), rather than the SWoRD system. The SWoRD-specific comments had to do with things like the deadlines being 9pm which was hard for students to remember (this isn't something the instructor can change), or the files being converted to PDFs so it was hard to refer to specific points in the papers (versus hard copies or Word docs that could be marked up). But students did seem to see the value in the reviewing, with several students commenting that doing the reviews helped them see where their own papers needed improvement.
So to sum up, I do think that the SWoRD system can still be useful for some instructors; if nothing else, it provides an infrastructure for students to submit papers, have reviewers randomly and anonymously assigned, and give/get feedback from multiple readers. You don't have to use the scores that the system generates. I particularly think SWoRD could be good for shorter assignments, where the evaluation criteria are relatively objective (and thus reviews might be more consistent). But if you aren't going to use the grades generated by the system, I think there may be other, better tools that could be used to facilitate peer reviewing; I'll talk about some of those options in my next post...
Comments
Post a Comment
Comments that contribute to the discussion are always welcome! Please note that spammy comments whose only purpose seems to be to direct traffic to a commercial site will be deleted.