Skip to main content

SWoRD follow-up

I really should have gotten back to this sooner but for those who are wondering how things went with SWoRD, the peer review writing site I used with my writing class in the spring, my overall reaction is that while it might be useful for some people, I probably won't use it the next time around. For those who missed my earlier posts, I discussed the basics of SWoRD, whether SWoRD can replace instructor grading, and some first reactions to SWoRD's reviewing process (after the first assignment) back in March. I made some tweaks as the semester progressed but overall, I have to say the experience was still pretty rough.

To briefly recap, SWoRD is an online peer review system where 1) students upload their papers, 2) the system randomly assigns other students to anonymously review those papers, 3) peer reviewers give both open-ended comments and numeric ratings in response to instructor-generated prompts, 4) authors 'back evaluate' their reviews, which means they give a numeric rating of how helpful the open-ended comments were, and 5) the system uses the numeric ratings from the reviewers to generate a writing score for the authors and uses the back evaluation ratings from the authors to generate a reviewing score for the reviewers. That last step, having the writing and reviewing scores generated entirely from the students themselves, is the main benefit of SWoRD, relative to other online peer review options like Calibrated Peer Review or Turnitin's PeerMark. My opinion is that the system has some problems that make those grades somewhat suspect. Unfortunately, I'm not sure there really is any satisfactory way to automate that process.

"Bad" reviewers may not be penalized
For starters, my original understanding of how the SWoRD grading system works was incorrect. I relied on some research papers that are posted on the SWoRD site (papers published a few years ago) and the system has since been changed but that is not explained anywhere on the site. The earlier papers said that the writing grades were weighted in such a way that if the score from one reviewer was substantially different from the scores from other reviewers, that score would be given less weight. However, that is not actually the case, which I discovered when one of my better students kept bugging me about his grade on one particular assignment. When I looked at the scores, there was one reviewer who gave 1's and 2's (out of 7) to all the papers he reviewed. Since that reviewer also did not provide very helpful comments, my guess is that he was either confused about the scoring or just lazy and not taking it seriously. Based on my original understanding, I thought the fact that his scores were so much lower than the other reviewers should have lowered that student's 'accuracy' reviewing grade and his scores should have been given a lot less weight for the students he reviewed. Neither of those things happened (his reviewing grade was actually somewhat higher than the class average and his scores definitely reduced the writing score for those papers). When I asked the SWoRD team about this, the response was that the "accuracy" part of the reviewing grade is based on rank orderings, not a comparison to the other ratings; that is, as long as the reviewer is giving higher ratings to 'better' papers and lower ratings to 'worse' papers, the system considers the ratings to be 'accurate'. The message from the SWoRD team said that they had "decided it wasn't valid to penalize someone for using a different range of the scale because often they were actually the most valid rater, with other students rating too high overall. If the instructor decides [a student] was unreasonably harsh, the thing to do is give [that student] a lower reviewing grade." On the one hand, I understand why they made that change, since I definitely noticed that my better students tended to give somewhat lower scores, on average (along with better comments justifying their scores), than their classmates. On the other hand, if I have to go through and scrutinize all the scores to see if students are scoring appropriately, that seems to defeat the whole purpose in having the scoring algorithm in the first place.

Incomplete information for back evaluations
Based on my reading of the research papers, in the earlier versions of the system, students could not submit back evaluations until after they turned in their second draft but they did see both the comments and the numeric scores from the reviewers (requiring them to turn in the second draft before doing the back evaluations was a way to make sure students actually had to process the comments before evaluating them). In the current version, students do not get to see the numeric reviewing scores until after they have submitted their back evaluations. Again, I can understand why this change was made; I can certainly imagine that some students would 'retaliate' for low reviewing scores by giving low back evaluation scores. But on the other hand, I saw many instances where reviewers gave scores that were not consistent with, or explained by, their open-ended comments (for example, a vague comment that 'everything looks fine' followed by a score of 3 or 4 out of 7). In my opinion, those reviewers should be given lower reviewing scores but the only way to accomplish this would be if the instructor goes in and manually reviews all the scores and comments, again defeating the purpose of having the scoring automated.

Reviewing itself is useful (but I'm still learning)
Given the problems with the scoring, I was expecting more negative comments from the students at the end of the semester but evaluations of the system were actually relatively positive, though less than half thought I should continue to use it in the future. Many of the critical comments were about the reviewing process itself (e.g., wanting more guidance for how to do good reviews, feeling like classmates didn't take it seriously enough or didn't give useful feedback, saying they should only review three papers instead of four or five, etc.), rather than the SWoRD system. The SWoRD-specific comments had to do with things like the deadlines being 9pm which was hard for students to remember (this isn't something the instructor can change), or the files being converted to PDFs so it was hard to refer to specific points in the papers (versus hard copies or Word docs that could be marked up). But students did seem to see the value in the reviewing, with several students commenting that doing the reviews helped them see where their own papers needed improvement.

So to sum up, I do think that the SWoRD system can still be useful for some instructors; if nothing else, it provides an infrastructure for students to submit papers, have reviewers randomly and anonymously assigned, and give/get feedback from multiple readers. You don't have to use the scores that the system generates. I particularly think SWoRD could be good for shorter assignments, where the evaluation criteria are relatively objective (and thus reviews might be more consistent). But if you aren't going to use the grades generated by the system, I think there may be other, better tools that could be used to facilitate peer reviewing; I'll talk about some of those options in my next post...

Comments

Popular posts from this blog

What are the costs?

I came across an interesting discussion about a 19-year-old intern who was fired from The Gazette in Colorado Springs for plagiarism. There appears to be some controversy over the fact that the editor publicly named the girl in a letter to readers (explaining and apologizing for the plagiarism), with some people saying that doing so was unduly harsh because this incident will now follow her for the rest of her career. I was intrigued by this discussion for two reasons - one, it seems pretty clear to me that this was not a case of ignorance (as I have often encountered with my own students who have no idea how to paraphrase or cite correctly) and two, putting aside the offense itself, I have often struggled with how to handle situations where there are long-term repercussions for a student, repercussions that lead the overall costs to be far higher than might seem warranted for the specific situation. As an example of the latter issue, I have occasionally taught seniors who need to p

What was your high school economics experience like?

As I mentioned in my last post , I am asking my Econ for Teachers students to reflect on their reading by responding to discussion prompts. It occurred to me that it wouldn't be a bad idea for me to share my thoughts on those issues here and see if anyone wants to chime in. For this week, the students were asked to read the California and national content standards , an article by Mark Schug and others about why social science teachers dread teaching economics and how to overcome the dread, an article by William Walstad on the importance of economics for understanding the world around us and making better personal decisions (with some evidence on the dismal state of economic literacy in this country), and another article by Walstad on the status of economic education in high schools (full citations below). The reflection prompt asks the students to then answer the following questions: What was your high school econ experience like? What do you remember most from that class? How do

When is an exam "too hard"?

By now, you may have heard about the biology professor at Louisiana State (Baton Rouge) who was removed from teaching an intro course where "more than 90 percent of the students... were failing or had dropped the class." The majority of the comments on the Inside Higher Ed story about it are supportive of the professor, particularly given that it seems like the administration did not even talk to her about the situation before acting. I tend to fall in the "there's got to be more to the story so I'll reserve judgment" camp but the story definitely struck a nerve with me, partly because I recently spent 30 minutes "debating" with a student about whether the last midterm was "too hard" and the whole conversation was super-frustrating. To give some background: I give three midterms and a cumulative final, plus have clicker points and Aplia assignments that make up about 20% of the final grade. I do not curve individual exams but will cu