Walden University Test Items for Depression Discussion

Description

In a scholarly community, critique is an important process that fosters the spread of ideas and information, improves quality of work, and encourages academic discourse. Critiques should be grounded in academic knowledge, current literature, and professional experience, rather than unsupported opinions. In test development, specifically, experts may be called upon to write items for a test or to critique the items written by others. In this Discussion, you have the opportunity to share the test items you developed in Week 5 with your colleagues and provide constructive feedback on each other’s work. As you review your colleagues’ items, think about how well the items measure the construct and whether the items are clear and unambiguous.

Running Head: TEST ITEMS 1 Test Items Name Institutional Affiliation TEST ITEMS 2 The test will contain 20 items. The test items for families and individuals who have attempted suicide include: about how often one lost in interest in aspects of life that used to be very important; future is hopeless; difficulty in making decisions; agitation and keeping moving around; feeling blue, sad, and unhappy; the current level of concentration on reading; doing this slowly; I feel like joy and pleasure of life has gone out; feeling fatigued; taking great effort to do simple things; feeling of guilty person who needs punishment; I feel lifeless than before; my sleep has been broken, disturbed, or too much; I spend a lot of time thinking about how I would kill myself; feeling depressed even under good situations or events; Lost in interest in food; I have gain or lost weight; I feel caught or trapped; Interest in friends or family members; about how often one feel fidgeting or restless; how many times one feel worthless; feeling so nervous that nothing can calm down (Garber et al., 2016). Each of the 20 items about the feelings and behaviors of the families and individuals help in understanding the severity, level of depression and making a conclusion on the possibility of suicides (Garber et al., 2016). The respondents including the family members and individuals who have attempted suicide would be required to rate their response as either not at all, just a little, somewhat, moderately, quite a lot, and very much. These ratings scale help in scoring the depression as to whether it is severe depression, moderate depression, borderline depression, possible mild depression, and no depression (Tommasi, Ferrara & Saggino, 2018). This helps victims and their families to devise ways of managing the depression. TEST ITEMS 3 References Garber, J., Brunwasser, S. M., Zerr, A. A., Schwartz, K. T., Sova, K., & Weersing, V. R. (2016). Treatment and prevention of depression and anxiety in youth: test of cross‐over effects. Depression and anxiety, 33(10), 939-959. Tommasi, M., Ferrara, G., & Saggino, A. (2018). Application of Bayes’ Theorem in Valuating Depression Tests Performance. Frontiers in psychology, 9, 1240. JOURNAL OF EDUCATIONAL MEASUREMENT VOLUME 23, NO. 2, SUMMER 1986, pp. 171-173 CAN A TEST BE TOO RELIABLE? HOWARD WAINER Educational Testing Service It is shown that summary statistics that are commonly used to measure test quality (reliability, mean rbis• and mean proportion correct) can be seriously misleading. This is demonstrated and explained. Being too reliable, like being too rich or too thin, is a state that psychometricians have long been taught is impossible to attain. Yet, can an overlarge reliability be indicative of an error? Since Spearman and Brown it has been well known that with most well-developed tests, reliability and test length go hand in hand. In fact, one can often make a fairly accurate guess as to the reliability of a test just by knowing its length (see Gulliksen, 1950, p. 81, for a graph that allows one to do this). Thus, when a relatively short test shows unusually high reliability it should be cause for concern rather than unbridled jubilation. To illustrate this, consider an example from our own unfortunate and immediate past. During the course of an item analysis of a 28-item test (n = 2,450) we found a respectable a of .83. More than respectable, it was downright wonderful. With a mean score of 51% correct and a mean rbis of .45, it looked, on the surface, like a test that could stand proudly in the glare of public scrutiny-a shining example of the test maker’s skill. A deeper look disappointed us, and taught us a lesson that seems valuable to share. Shown in Figure I is a stem-and-leaf diagram of the item-total biserial correlation coefficients (rbis) for the 28 items of the test. Note that some are very high-astonishingly so-and a few others are negative. This latter aspect of the test led us to question the results. We were told that there must be a mistake since “we do not give tests with negative rbis·” At this point, we looked more carefully at the score distribution (see Figure 2). The “sore thumb” that sticks out at the low end of the score distribution confirmed our suspicions that something was amiss. But what? The test had 28 items with four choices each, thus a score of 7 represented chance. The peak of the first mode in the score distribution was at 9, just above chance. This provided a further hint. Further detective work revealed what had happened. The data we had analyzed as a single test form were, in fact, two different forms. One form, whose key we used in scoring, had I ,356 examinees. The other had 1 ,094. Scoring a test with an inappropriate key would yield scores for the affected examinees in the vicinity of chance. In actual fact, we discovered that exactly 9 of the 28 items had the same keyed correct response, so that someone who achieved a perfect score on the incorrectly scored form would have an observed score of 9. Note that the lower mode of the obtained score distribution is 9. Once we realized the cause of the problem, the explanation for the various My thanks to Paul Holland. 171 172 HOWARD WAINER n = 2450 .9 277 .8 •7 6788 113455677 .6 .5 .4 .3 .2 .1 0.0 -.0 -.1 Figure 1. 9 3 278 25 -.2 48 -.2 -.4 14 5 Stem-and-Leaf Display of Biserial Correlations for 28 Items 300 250 200 >(..) z w 150 ::::> 0 w a: LL. 100 50 0 0 5 10 15 20 25 TEST SCORE Figure 2. Distribution of Raw Scores for the 2450 examinees 30 CAN A TEST BE TOO RELIABLE? 173 observed anomalies became apparent. For example, the biserial correlations would be very high when the two keys disagreed, because those who scored low (near chance) would get the item wrong, and those who scored high would get it right. Similarly, the negative biserials would occur when the two keys agreed, for then the low scoring individuals would get the item correct. This problem became even worse when the item was relatively difficult on the correctly scored form, and relatively easy on the incorrectly scored form. Last, we return to our original topic, the reliability. No wonder it was high. The score distribution was broadened, yielding a large group with very low scores and another group with high scores. Any split of the test would yield the same picture. When the test was rescored properly, the reliability dropped to the more usual .6 level for each form. It seems clear that when a test has a reliability that is too low the test will be scrutinized. The purpose of this note was to emphasize the importance of giving the test close scrutiny when the reliability is too high as well. The problem will probably be of a different cause, but it may be a problem nonetheless. The example presented is but one case of this. It emphasizes that summary statistics for the whole test (mean p+, mean rb;, a) are not sufficient for judging the quality of the test. REFERENCE GULLIKSEN, H. (1950). Theory of menta/tests. New York: John Wiley. AUTHOR HOWARD WAINER, Senior Research Scientist, Educational Testing Service, Princeton, NJ 08541. Degrees: BS, Rensselaer Polytechnic Institute; AM, PhD, Princeton University. Specializations: Statistics, psychometrics, graphics.

Grid View

	Excellent	Good	Fair	Poor
Main Posting	45 (45%) – 50 (50%) Answers all parts of the discussion question(s) expectations with reflective critical analysis and synthesis of knowledge gained from the course readings for the module and current credible sources. Supported by at least three current, credible sources. Written clearly and concisely with no grammatical or spelling errors and fully adheres to current APA manual writing rules and style.	40 (40%) – 44 (44%) Responds to the discussion question(s) and is reflective with critical analysis and synthesis of knowledge gained from the course readings for the module. At least 75% of post has exceptional depth and breadth. Supported by at least three credible sources. Written clearly and concisely with one or no grammatical or spelling errors and fully adheres to current APA manual writing rules and style.	35 (35%) – 39 (39%) Responds to some of the discussion question(s). One or two criteria are not addressed or are superficially addressed. Is somewhat lacking reflection and critical analysis and synthesis. Somewhat represents knowledge gained from the course readings for the module. Post is cited with two credible sources. Written somewhat concisely; may contain more than two spelling or grammatical errors. Contains some APA formatting errors.	0 (0%) – 34 (34%) Does not respond to the discussion question(s) adequately. Lacks depth or superficially addresses criteria. Lacks reflection and critical analysis and synthesis. Does not represent knowledge gained from the course readings for the module. Contains only one or no credible sources. Not written clearly or concisely. Contains more than two spelling or grammatical errors. Does not adhere to current APA manual writing rules and style.
Main Post: Timeliness	10 (10%) – 10 (10%) Posts main post by day 3.	0 (0%) – 0 (0%)	0 (0%) – 0 (0%)	0 (0%) – 0 (0%) Does not post by day 3.
First Response	17 (17%) – 18 (18%) Response exhibits synthesis, critical thinking, and application to practice settings. Responds fully to questions posed by faculty. Provides clear, concise opinions and ideas that are supported by at least two scholarly sources. Demonstrates synthesis and understanding of learning objectives. Communication is professional and respectful to colleagues. Responses to faculty questions are fully answered, if posed. Response is effectively written in standard, edited English.	15 (15%) – 16 (16%) Response exhibits critical thinking and application to practice settings. Communication is professional and respectful to colleagues. Responses to faculty questions are answered, if posed. Provides clear, concise opinions and ideas that are supported by two or more credible sources. Response is effectively written in standard, edited English.	13 (13%) – 14 (14%) Response is on topic and may have some depth. Responses posted in the discussion may lack effective professional communication. Responses to faculty questions are somewhat answered, if posed. Response may lack clear, concise opinions and ideas, and a few or no credible sources are cited.	0 (0%) – 12 (12%) Response may not be on topic and lacks depth. Responses posted in the discussion lack effective professional communication. Responses to faculty questions are missing. No credible sources are cited.
Second Response	16 (16%) – 17 (17%) Response exhibits synthesis, critical thinking, and application to practice settings. Responds fully to questions posed by faculty. Provides clear, concise opinions and ideas that are supported by at least two scholarly sources. Demonstrates synthesis and understanding of learning objectives. Communication is professional and respectful to colleagues. Responses to faculty questions are fully answered, if posed. Response is effectively written in standard, edited English.	14 (14%) – 15 (15%) Response exhibits critical thinking and application to practice settings. Communication is professional and respectful to colleagues. Responses to faculty questions are answered, if posed. Provides clear, concise opinions and ideas that are supported by two or more credible sources. Response is effectively written in standard, edited English.	12 (12%) – 13 (13%) Response is on topic and may have some depth. Responses posted in the discussion may lack effective professional communication. Responses to faculty questions are somewhat answered, if posed. Response may lack clear, concise opinions and ideas, and a few or no credible sources are cited.	0 (0%) – 11 (11%) Response may not be on topic and lacks depth. Responses posted in the discussion lack effective professional communication. Responses to faculty questions are missing. No credible sources are cited.
Participation	5 (5%) – 5 (5%) Meets requirements for participation by posting on three different days.	0 (0%) – 0 (0%)	0 (0%) – 0 (0%)	0 (0%) – 0 (0%) Does not meet requirements for participation by posting on 3 different days.
Total Points: 100