IVs and DVs

IVs and DVs

Sample Answer for IVs and DVs Included After Question

General QDAFI Grading Rubric

Q: What question did the researchers ask? 2/2 for the full question 1.5/2 for mostly correct (missing a detail, etc.) 1/2 for along the right lines but missing a major point 0/2 for not at all correct D: What did the researchers do? 2/2 for including most or all IVs and DVs 1.5/2 for including some IVs and DVs 1/2 failing to include most IVs and DVs 0/2 for not including any A: What is the author’s rationale? 2/2 including the rationale for the research q and what they expect (hypothesis) 1.5/2 for including rationale or hypothesis 1/2 not including either but there is something to give points for (along the right lines) 0/2 for not including anything relevant F: What did they find? 2/2 including all relevant findings (at least main one) and not including numbers or exact stats 1.5/2 for including some relevant findings but not the main one 1/2 not including either but there is something to give points for (along the right lines) 0/2 for not including anything relevant I: What is the author’s interpretation? 2/2 including the summary of what they found and their interpretation. Alternatively, including their explanation, some large issue with it and the student’s interpretation 1.5/2 for including some of the above 1/2 not including either but there is something to give points for (along the right lines) 0/2 for not including anything relevant Sample Explanations – for grading In your ‘Q’ section: Try to aim to answer the specific research question the authors are asking. Are they looking at just any predictors or are there specific predictors they’re aiming to study? In your ‘D’ section: Clearly state in your sentences the author’s IVs, DVs, controls (if any), and methods of measurement. You don’t need to list the specific tests in your ‘D’, just the constructs of which they measure. The reader would only need to know these if they were planning on replicating the study. In your ‘A’ section: You’ve got part of it there (the hypothesis), but you left out the “why”….. Why do they think they’ll find this? What is the motivating drive that led them to do ‘D’ to find ‘Q’? In your ‘A’ section, remember to include both the author’s rationale and their hypothesis. You have a good hypothesis here, but you’re lacking the complete rationale (why they think they’ll find this). In your ‘A’ section: You’ve got part of it there (the rationale), but you left out the hypothesis. Think of this section as an “if, then” statement. IF (the rationale and motivating drive behind the research question are true), THEN (this is what they think will happen). You don’t have to write it in this manner, but it will help with the thought process. In your ‘A’ section, remember to include both the author’s rationale and their hypothesis (what they think their outcome will be from this study). You have a good rationale here, but you’re lacking the hypothesis. In your ‘A’ section remember to include both the authors’ rationale and their hypothesis: Think of this section as an ” if, then” statement. IF (the rationale and motivating drive behind the research question are true. – “Why do they think they’ll find this? What is the motivating drive that led them to do ‘D’ to find ‘Q’?”), THEN (this is what they think will happen). You don’t have to write it in this manner, but it will help with the thought process. In your ‘I’ section: full credit here would include the summary of what they found and their interpretation. Alternatively, include their explanation, some large issue with it and the student’s interpretation In your ‘I’ section: Be sure to include significant real-world factors these finds could or could not have, along with any issues there could be with this study (e.g., reliability, generalizability, etc). 697667 research-article2017 PSSXXX10.1177/0956797617697667 Corrigendum Corrigendum: Birds of a Feather Do Flock Together: Behavior-Based PersonalityAssessment Method Reveals Personality Similarity Among Couples and Friends Psychological Science 2017, Vol. 28(3) 403­ © The Author(s) 2017 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0956797617697667 https://doi.org/10.1177/0956797617697667 www.psychologicalscience.org/PS Original article: Youyou, W., Schwartz, H. A., Stillwell, D., & Kosinski, M. (2017). Birds of a feather do flock together: Behavior-based personality-assessment method reveals personality similarity among couples and friends. Psychological Science. Psychological Science, 28, 276–284. doi:10.1177/0956797616678187 As a result of an oversight, the order in which the authors of this article were listed was incorrect. The correct order of authorship is as follows: Wu Youyou, David Stillwell, H. Andrew Schwartz, and Michal Kosinski 678187 research-article2016 PSSXXX10.1177/0956797616678187Youyou et al.Personality Similarity Research Article Birds of a Feather Do Flock Together: Behavior-Based Personality-Assessment Method Reveals Personality Similarity Among Couples and Friends Psychological Science 2017, Vol. 28(3) 276­–284 © The Author(s) 2017 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0956797616678187 www.psychologicalscience.org/PS Wu Youyou1,2, David Stillwell3, H. Andrew Schwartz4, and Michal Kosinski5 1 Department of Psychology, University of Cambridge; 2Kellogg School of Management, Northwestern University; 3Judge Business School, University of Cambridge; 4Department of Computer Science, Stony Brook University; and 5Graduate School of Business, Stanford University Abstract Friends and spouses tend to be similar in a broad range of characteristics, such as age, educational level, race, religion, attitudes, and general intelligence. Surprisingly, little evidence has been found for similarity in personality—one of the most fundamental psychological constructs. We argue that the lack of evidence for personality similarity stems from the tendency of individuals to make personality judgments relative to a salient comparison group, rather than in absolute terms (i.e., the reference-group effect), when responding to the self-report and peer-report questionnaires commonly used in personality research. We employed two behavior-based personality measures to circumvent the referencegroup effect. The results based on large samples provide evidence for personality similarity between romantic partners (n = 1,101; rs = .20–.47) and between friends (n = 46,483; rs = .12–.31). We discuss the practical and methodological implications of the findings. Keywords similarity, personality assessment, reference-group effect, social network, close relationships Received 5/4/15; Revision accepted 10/16/16 It is well established that in close relationships, individuals tend to be similar in a wide range of characteristics (McPherson, Smith-Lovin, & Cook, 2001), including age, education, race, religion, attitudes, and general intelligence (Rushton & Bons, 2005). Surprisingly, little evidence has been found for personality—a fundamental psychological construct that underpins much of the variation in human behaviors. Most past research has shown no or only weak similarity in personality between partners and between friends (Altmann, Sierau, & Roth, 2013; Anderson, Keltner, & John, 2003; Beer, Watson, & McDade-Montez, 2013; Botwin, Buss, & Shackelford, 1997; Buss, 1984a; Funder, Kolar, & Blackman, 1995; Rushton & Bons, 2005; Watson, Beer, & McDade-Montez, 2014; Watson, Hubbard, & Wiese, 2000a; Watson et al., 2004), with occasional findings indicating moderate similarity in the Big Five factors of Openness to Experience and Conscientiousness between romantic partners (Donnellan, Conger, & Bryant, 2004; McCrae et al., 2008; Watson, Hubbard, & Wiese, 2000b). This has led researchers to maintain the conclusion drawn by an early theorist that “mating is essentially random for personality differences” (Eysenck, 1990, p. 252). We argue that the lack of consistent evidence for personality similarity among couples or friends stems from the reliance on self-report and peer report1 of personality in a majority of previous studies. These assessment methods are unsuitable for studying the similarity effect, because they are affected by a tendency of the respondents to Corresponding Author: Wu Youyou, Northwestern Institute on Complex Systems, Chambers Hall, 600 Foster St., Evanston, IL 60208-4057 E-mail: [email protected] Personality Similarity judge themselves relative to a salient comparison group, rather than in absolute terms (the reference-group effect; Heine, Buchtel, & Norenzayan, 2008; Heine, Lehman, Peng, & Greenholtz, 2002). For instance, an introverted engineer might perceive himself as relatively extraverted if he is surrounded by a group of even more introverted engineer friends. The same bias affects peer report as well; the introverted friends of the engineer might also see him as extraverted by comparing him with themselves. In fact, some widely used personality questionnaires specifically instruct people to describe themselves “in relation to other people you know” (e.g., the International Personality Item Pool, IPIP, measuring the five-­ factor model of personality; Goldberg et al., 2006). Several studies have found that self-reports of personality do not always correspond with behavioral measures (Heine et al., 2008; Ramírez-Esparza, Mehl, ÁlvarezBermúdez, & Pennebaker, 2009). The authors of these studies have suggested that the reference-group effect is a possible explanation. Subsequent experimental studies confirmed that the reference-group effect indeed pertains to questionnaire-based personality judgments (Credé, Bashshur, & Niehorster, 2010; Wood, Brown, Maltby, & Watkinson, 2012). We therefore argue that self- and peer report are inappropriate methods for studying personality similarity, because they amplify the differences in actual personality and obscure the similarity among partners and friends, who likely unconsciously treat one another as reference groups. Indeed, rare evidence of personality similarity emerged from a few studies relying on personality measures that are less susceptible to the reference-group effect. For example, Botwin et al. (1997) and Buss (1984a) measured personality using independent interviewers’ ratings and found similarity among spouses. Admittedly, this type of measure is still subject to the reference-group effect because the interviewer has his or her own reference group, but it affects both dyad members equally and therefore does not obscure the similarity between them. Buss (1984b) also found similarity between romantic partners by measuring personality using self- and peer-reported frequencies of certain personality-related behaviors (the act-frequency approach; Buss & Craik, 1983). Introversion, for example, was assessed by asking participants to judge whether in the last 3 months they “watched the soap opera on TV” or “went for a long walk alone” (Buss, 1984b, p. 368). This approach focuses on concrete behaviors and thus leaves less room for subjective comparisons. In light of these mixed findings, we aimed to address the reference-group effect and reexamine the existence of personality similarity between romantic couples and 277 between friends. We employed two behavior-based personality measures to circumvent the reference-group effect. The first approach measured personality using a common type of digital footprint: Facebook Likes. Facebook users generate Likes by clicking a Like button on Facebook Pages related to products, famous people, books, etc.2 This feature allows users to express their preferences for a variety of content. It has been shown that Likes can be used to accurately assess people’s personality (Kosinski, Stillwell, & Graepel, 2013; Youyou, Kosinski, & Stillwell, 2015). For example, people who score high on Extraversion tend to Like “partying,” “dancing,” and celebrities such as “Snooki” (a reality-TV personality).3 The second approach measured personality using digital records of language use: Facebook status updates. Facebook users write status updates to share their thoughts, feelings, and life events with friends. Previous research has consistently found links between personality and language use (Hirsh & Peterson, 2009; Mehl, Gosling, & Pennebaker, 2006; Tausczik & Pennebaker, 2010). Extraverts, for example, tend to use more words describing positive emotions (e.g., “great,” “happy,” or “amazing”; H. A. Schwartz et al., 2013) than introverts do. Several studies have demonstrated accurate personality assessment based on people’s language use in social media (Farnadi et al., 2014; Sumner, Byers, Boochever, & Park, 2012), including Facebook status updates (Park et al., 2014; H. A. Schwartz et al., 2013). For both Likes-based and language-based approaches, we measured personality in the following way. First, we obtained a sample of participants with both self-reports of personality and Facebook data. Next, we built a series of predictive models to link self-reports of personality with Likes or language use, respectively. This process allowed us to establish which digital signals were indicative of specific personality traits. The resulting models were then applied to a separate sample of romantic partners and friends to generate personality scores for these participants. The personality scores were correlated between dyad members to measure similarity. Notably, although both Likes-based and languagebased models are developed based on participants’ selfreported personality scores, they do not inherit the reference-group effect. The reference-group bias that contaminates personality similarity is a result of individuals using different standards, norms, or reference groups to evaluate themselves. In our analysis, the same personality-prediction models were applied to the entire sample, and therefore the evaluation standards were uniform across all participants. Youyou et al. 278 Method Likes-based personality assessment Participants The Likes-based personality-assessment model was built using Sample 1 following the procedure described in detail in Youyou et al. (2015). We first transformed participants’ Like data into a matrix, in which each row represented a participant, and each column represented a Like. The (i, j) entry was set to 1 if participant i liked object j, and 0 otherwise. A substantial number of Likes were associated with only a few participants in this sample, and some participants had only a small number of Likes. Since the assessment models leveraged the association between liking certain things and having a particular personality type, it was necessary to have enough combinations of personality profiles and Likes as training examples. The matrix was therefore trimmed so that participants with fewer than 20 Likes and Likes associated with fewer than 20 participants were removed. The resulting matrix consisted of 295,320 participants (rows) and 148,128 unique Likes (columns). For each of the five personality traits, a linear regression model was fitted to predict the self-reported personality scores from the participant-Like matrix (each column was treated as a variable); a combination of L1 (least absolute shrinkage and selection operator, or LASSO; Tibshirani, 1996) and L2 (ridge; Hoerl & Kennard, 1970) penalties were used for the models.4 A 10-fold cross-validation was applied in each model to avoid overfitting. The resulting five models, one for each personality trait, were applied to a separate sample of romantic couples (n = 990) and friends (n = 41,880) in Sample 3, to generate behavior-based personality scores for these participants. This sample contained only couples and friendship dyads in which both members had at least 20 Likes on their profile. The average number of Likes per participant was 159.4. We removed Likes shared between each pair of dyad members to ensure that they did not artificially inflate similarity. Such an overlap in Likes between dyad members was relatively low: friends, on average, shared 5.2 Likes, or 1.4% of their joint Likes; romantic partners shared 12.8 Likes, or 3.5% of their joint Likes. To evaluate the predictive accuracy of the Likes-based models, we correlated Likes-based and self-reported personality scores for a subset of participants in Sample 3 (note that the model was developed using Sample 1). After shared Likes were removed for all individuals, the correlations were as follows—Openness to Experience: r(22,692) = .39, Conscientiousness: r = .28, Extraversion: r = .31, Agreeableness: r = .25, and Neuroticism: r = .29. This study relied on three samples obtained from the myPersonality project (http://mypersonality.org). MyPersonality was a popular Facebook application that allowed users to take psychological tests and receive feedback on their scores. A portion of participants provided opt-in consent to allow us to record their test scores and contents of their Facebook profile. The average participant age was 24.1 years. Females constituted 61.1% of the sample, and males constituted 38.9%. Sample 1 was used to build the Likes-based personality-assessment models. It contained 295,320 participants who completed personality questionnaires and had at least 20 Likes on their Facebook profile. Sample 2 was used to develop the language-based personality-assessment models. It contained 59,547 participants who completed personality questionnaires and wrote at least 500 words across all of their status updates. Sample 3 was used to study the existence of personality similarity between romantic partners and between friends. It contained 247,773 individuals forming a total of 5,042 heterosexual romantic couples and 138,553 friendship dyads. Romantic couples were identified using the “relationship status” field of the Facebook profile, and friendship connections were identified using Facebook friend lists. To ensure that all dyads included in the analysis were independent from one another, we randomly chose one dyad for the individuals belonging to multiple friendship dyads. Self-report of personality Self-reports of personality were obtained using a 20- to 100-item IPIP questionnaire (Goldberg et al., 2006) measuring the widely accepted five-factor model of personality (Revised NEO Personality Inventory; Costa & McCrae, 1992). Reliability scores for the 100-item questionnaire, completed by 29.4% of the participants, were as follows—Openness to Experience: Cronbach’s α = .84, Conscientiousness: α = .92, Extraversion: α = .93, Agreeableness: α = .88, and Neuroticism: α = .93. Corresponding values for the 20-item version, completed by 56.4% of the participants, were as follows—Openness to Experience: Cronbach’s α = .48, Conscientiousness: α = .67, Extraversion: α = .73, Agreeableness: α = .58, and Neuroticism: α = .65. The remaining participants (14.2%) completed the IPIP questionnaire ranging from 30 to 90 items (in intervals of 10). Self-reports were available for all participants in Samples 1 and 2, and for 4,287 romantic couples and 103,329 friendship dyads in Sample 3. Language-based personality The language-based personality-assessment model was developed using Sample 2, with an open-vocabulary Personality Similarity approach similar to the one employed by Park et al. (2014). We first extracted words and phrases (i.e., sequence of words) from participants’ status updates and then transformed them into two types of predictors: (a) binary indicators of whether the participant used each word and (b) relative frequencies of each word or phrase (as compared with the total number of words that each participant wrote). Words and phrases used by less than 1% of the participants were excluded when we created the predictors. The two types of predictors were each represented as a matrix and underwent randomized principal-component analysis (RPCA; Martinsson, Rokhlin, & Tygert, 2011) independently. They were then combined into a single participant-language matrix. For each of the five personality traits, a linear regression model with an L2 (ridge) penalty was fitted to predict the self-reported personality scores from the participant-language matrix. A 10-fold cross-validation was applied in each model to avoid overfitting. Because words and phrases shared between two partners and between friends could artificially inflate personality similarity, it was necessary to control for the overlap in language between dyad members. However, we could not exclude all the words shared between dyad members (as we did with Likes), because most of the common words would be removed as a result, which would lower the predictive accuracy. Instead, we randomly split all the available words and phrases into two halves, and submitted each half to the procedures described in the previous paragraph. The two resulting matrices were regressed onto participants’ self-reported personality scores to build two independent sets of predictive models. Finally, the two sets of models were each applied to a different member of the dyad. This process ensured that even if two dyad members used the same words or phrases, the two different models applied to each of them separately would capture only distinct parts of the overlap. The two sets of models developed here were applied to 282 romantic couples and 5,674 friendship dyads in Sample 3. These dyads all consisted of members that both had status updates available and wrote at least 500 words across all of their status updates. Participants in this sample wrote 4,474 words on average. To determine the predictive accuracy of the languagebased models, we correlated language-based and selfreported personality scores for a subset of participants in Sample 3 (note that the model was developed using Sample 2). After an overlap in language was controlled for (by applying independent models to each dyad member), the correlations were as follows—Openness to Experience: r(2,718) = .37, Conscientiousness: r = .32, Extraversion: r = .34, Agreeableness: r = .30, and Neuroticism: r = .33. 279 Measuring similarity Similarity between dyad members was measured by correlating their scores on a given personality trait across all dyads. Correlations were calculated for self-report, Likes-based, and language-based measures, respectively, between partners and between friends. Additionally, we calculated correlations between one dyad member’s Likes-based score and this person’s partner’s or friend’s language-based score across all dyads. For romantic couples, personality scores were aligned by gender, and Pearson product-moment correlation coefficients were used. For friendship dyads, intraclass correlations were used because dyads cannot be aligned by gender, and the assignment of dyad members as person A or person B is arbitrary (see Watson et al., 2000b).5 Results The goal of this study was to examine the degree of personality similarity between romantic partners and between friends. The results based on three different personality measures are presented in Figure 1. The Likes-based scores between dyad members showed significant positive correlations across all five personality traits—romantic couples: mean r(988) = .24, 95% confidence interval (CI) = [.18, .30]; friends: mean r(83,758) = .14, 95% CI = [.13, .15].6 An even stronger effect was observed in the language-based results— romantic couples: mean r(280) = .38, CI = [.28, .48]; friends: mean r(11,346) = .24, 95% CI = [.22, .26]. Using both Likes-based and language-based measures, the correlations did not differ substantially between same-sex and opposite-sex friendships (all differences were .03 or less). In contrast, self-reports showed weak to negligible personality similarity for both romantic couples, mean r(4,285) = .10, 95% CI = [.07, .13], and friends, mean r(206,656) = .06, 95% CI = [.06, .07]. All these correlations were significant at p < .001. The strength of personality similarity became clear when compared with the similarity observed for other variables. Personality similarity was not as strong as similarity in age—romantic couples: r(2,458) = .81, 95% CI = [.80, .82], p < .001; friends: r(85,076) = .57, 95% CI = [.57, .58], p < .001. However, it was comparable to or stronger than similarity in IQ: r(550) = .21, 95% CI = [.13, .29], p < .001 (this sample included both romantic couples and friends because there were not enough romantic couples in which both partners had IQ scores, n = 44, to allow for a meaningful comparison). These similarity results were based on personality scores measured using nonoverlapping Likes and language features. This was because the overlap in Likes or Youyou et al. 280 O O r = .50 r = .50 .40 .40 .30 .30 .20 .20 N C .10 10 A N A E Similarity in Likes-Based Personality (Assessed via Online Behavior) C .10 E Similarity in Language-Based Personality (Assessed via Online Behavior) Romantic Couple Friendship Dyad O: Openness to Experience C: Conscientiousness E: Extraversion A: Agreeableness N: Neuroticism O r = .50 .40 O r = .50 .40 .30 .3 .30 .20 .20 N C .10 A E Similarity Between Likes-Based and Language-Based Personality (Assessed via Online Behavior) N C .10 .1 A E Similarity in Self-Reports of Personality (Assessed via Questionnaire) Fig. 1. Radar charts showing the similarity in the Big Five personality traits between romantic partners and between friends. Results are shown separately for analyses based on Facebook Likes, Facebook language use, the combination of these two measures, and self-report questionnaires. Personality Similarity language features between dyad members might have been driven by factors other than personality, such as a shared environment, shared culture, or interpersonal influence. However, it also might have been partially driven by actual personality similarity. Consequently, these results represent a lower-bound estimate of similarity—some effect was lost. To calculate the upper-bound estimate, we performed the same analyses without controlling for shared Likes or language features. As expected, the results showed a stronger level of similarity. The Likesbased correlations were as follows—romantic couples: mean r(1,082) = .33, 95% CI = [.28, .38]; friends: mean r(87,842) = .19, 95% CI = [.18, .20]; the language-based ones were as follows—romantic couples: mean r(280) = .41, 95% CI = [.31, .50]; friends: mean r(11,346) = .25, 95% CI = [.23, .27], all ps < .001. One potential problem with the preceding analyses was that the scores of both dyad members were based on the same type of data, namely, Likes or status updates. This was problematic, as Facebook’s News Feed and its recommendation system might cause an artificial covariation between friends’ Likes or status updates (i.e., common-method bias; Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). For example, Facebook recommends Pages to its users that are similar to the ones their friends liked. Also, users are constantly exposed to their friends’ status updates in the News Feed and are therefore prone to post about similar topics. While we already controlled for overlap in Likes and language features, to further reduce potential sources of bias, we correlated the Likesbased scores of one person with the language-based scores of that person’s partner or friend. The results were similar to the ones already reported: The average personality similarity across the five traits was as follows— romantic couples: mean r(1,055) = .31, 95% CI = [.26, .36], ps < .001; friends: mean r(32,552) = .19, 95% CI = [.18, .20], ps < .001. Additionally, a series of analyses was performed to rule out alternative explanations for the observed personality similarity. First, we calculated the correlations between random pairs of participants to gauge the baseline similarity between strangers. None of the correlations were significant (|rs| < .01) for random dyads. Second, dyad members’ scores were correlated controlling for the number of Likes that they had or the number of words written in the status updates. This was to ensure that similarity in predicted personality between partners or between friends was not due to having a similar number of digital signals. These partial correlations were very similar to the zero-order ones (within .02 of the original values). Third, we investigated the extent to which the observed personality similarity was a by-product of similarity in other traits (Buss, 1984a). To this end, we reran the 281 analyses while controlling for dyad members’ age and education. For education, we calculated correlations for a subsample of friendship dyads in which both members were college graduates. We used keywords such as “university” or “college” (excluding “community college”) in the school names shown on Facebook profiles to identify a sample of participants with higher education degrees. The level of similarity between friends did not change considerably after taking education into account: All correlations were within .04 of the original values. Unfortunately, information about education was not available for enough of the romantic couples to provide for a meaningful comparison (n = 17 for the Likes-based approach and n = 6 for the language-based approach). The analysis was therefore limited to friends only. Similarly, little change was observed when we controlled for age. For both romantic couples and friends, partial correlations that controlled for age were all within .03, compared with zero-order ones. The only exception was Conscientiousness, for which the correlations decreased on average by .10 for romantic couples and .08 for friends across the three methods. Nevertheless, the similarity in Conscientiousness remained significant at p < .001 (romantic partners: mean r = .30; friends: mean r = .16). Discussion Our findings provide evidence that romantic partners as well as friends are characterized by similar personalities. We measured personality traits relying on three different sources of data: traditional self-report questionnaires, digital records of behaviors and preferences, and language use. Relatively strong similarity was detected between romantic partners and between friends when we used Likes-based and language-based measures. By contrast, self-reports yielded only weak to negligible similarity. Across all three methods, stronger personality similarity was found for romantic couples than for friends. We also showed that dyadic similarity in most personality traits was unlikely to be driven simply by similarity in age or education. The only exception was dyad members’ similarity in Conscientiousness, which was partially explained by their similarity in age. Compared with the other four traits, Conscientiousness is most strongly positively associated with age, especially before the age of 30 (Donnellan & Lucas, 2008; Soto, John, Gosling, & Potter, 2011). Because 88% of the participants were between 18 and 30 years old, it is not surprising that partners’ and friends’ similarity in Conscientiousness was partially due to their similarity in age. In which of the five personality traits were romantic partners and friends most similar? After controlling for age, we found that Openness to Experience displayed Youyou et al. 282 the strongest similarity in self-reports, Likes-based results, and Likes-language correlations for both romantic couples and friendship dyads. Language-based results, however, showed the strongest effect in Extraversion. However, we cannot draw definitive conclusions on the basis of our present analysis, because (a) the patterns were not consistent across all the methods that we employed, and (b) the effect sizes could be influenced by several factors, such as the strength of the referencegroup effect, the accuracy of the assessment models, and common method bias. These factors might affect the five traits differently and to varying degrees. Together, these results challenge the widely accepted notion that individuals in close relationships are not similar in personality. We argue that the scarcity of the evidence for the similarity effect is likely due to the reference-group effect. Notably, our results are consistent with those obtained in rare previous studies that relied on personality-assessment methods resistant to the reference-group effect (Botwin et al., 1997; Buss, 1984a, 1984b). On the other hand, the fact that the results presented contradict the majority of previous findings means that they should be treated with caution. We hope that future research will replicate our findings using other methods. From a methodological perspective, the present research highlights the limitations of questionnaire measures (Heine et al., 2002). While self-report personality questionnaires provide excellent reliability and validity in most applications (Costa & McCrae, 1992; Goldberg et al., 2006; Ozer & Benet-Martínez, 2006), they fail to assess personality similarity between individuals. As illustrated in the present research, personality assessment based on digital records of preferences and language, while still relatively new and unproven, has the potential to address this issue. Both Likes-based and language-based personalityassessment methods are unlikely to be affected by the reference-group effect: People do not use words or do things just because their friends refrained from doing so. In fact, the reverse effect is likely: A shared environment, culture, or interpersonal influence may inflate the similarity in Likes and language between dyad members. For example, people from the same cohort are likely to be fans of similar pop stars of their generation, and two friends might both write “explosion” in their status updates because they live in the same area and an explosion recently happened nearby. We addressed these limitations by controlling for age and education, removing Likes shared between dyad members, applying two disjoint sets of language-based models to each of the dyad members, and correlating Likes-based scores with language-based scores. However, we might have still omitted variables, such as subcultures, driving the adoption of similar language patterns or preferences online. Finally, if the reference-group effect obscures the similarity in personality measured using self-reports, why has similarity been consistently detected for self-reported attitudes and values, such as religious and political views (e.g., Alford, Hatemi, Hibbing, Martin, & Eaves, 2011; Gaunt, 2006)? There are two possible explanations. First, the similarity in values and attitudes might have been underestimated and would be higher if the referencegroup effect was eliminated. Past research shows that larger effects were generally found when behavioral or objective indicators of attitudes were used (e.g., church attendance; see Alford et al., 2011; Gaunt, 2006; Watson et al., 2004) compared with self-ratings on broad statements using the Likert scale (e.g., “How religious are you?”; see Caspi, Herbener, & Ozer, 1992; Gaunt, 2006; Lee et al., 2009; Watson et al., 2014). This pattern suggests that the reference-group effect applies to selfreported attitudes as well. Second, self-reports of values and attitudes might be subject to the counter-motive of conformity, canceling the reference-group effect. Past studies have found that people shift their attitudes to align with those of their romantic partners (Davis & Rusbult, 2001; Kalmijn, 2005). In contrast, no such tendency has been discovered for personality (Anderson et al., 2003; Caspi et al., 1992). We hope that future research will replicate our analysis in attitudinal domains, especially in basic values (e.g., S. H. Schwartz, 1992), a domain in which measures other than self-report are in short supply. Action Editor Ralph Adolphs served as action editor for this article. Author Contributions W. Youyou designed the research. M. Kosinski and D. Stillwell collected and cleaned the data, and W. Youyou, H. A. Schwartz, and M. Kosinski performed the analysis. W. Youyou, D. Stillwell, H. A. Schwartz, and M. Kosinski wrote the manuscript. Acknowledgments We thank John Rust, Patrick Morse, Sandra Matz, Jason Rentfrow, Josh Sacco, Vesselin Popov, and Jingwei Yu for their critical reading of the manuscript. We thank Isabelle Abraham for proofreading. Declaration of Conflicting Interests D. Stillwell received revenues as the owner of the myPersonality application that collected the data used in this research. The authors declared that they had no other potential conflicts of interest with respect to their authorship or the publication of this article. Personality Similarity Notes 1. Peer report here refers to when members in a dyad (e.g., a couple or a pair of friends) rate each other’s personality. 2. Although Facebook users can Like many different types of content on Facebook—including status updates, photos, comments, pages, and links—we measured Likes associated only with Facebook Pages. 3. See Kosinski et al. (2013) for more examples of Likes strongly associated with each personality trait. 4. LASSO and ridge are variable-selection-and-constraint methods that are best used for a large number of predictors, especially when they are highly correlated. They constrain, or penalize, the absolute size of the regression coefficients so that no coefficients are too large when compared with others. 5. We also tested an alternative approach: randomly assigning friendship dyad members as person A or person B, and conducting Pearson product-moment correlations. The results were very similar to those obtained with the intraclass correlation (within .02 of the original values). 6. The average correlations across the five traits were calculated using Fisher’s r-to-z transformation. References Alford, J. R., Hatemi, P. K., Hibbing, J. R., Martin, N. G., & Eaves, L. J. (2011). The politics of mate choice. The Journal of Politics, 73, 362–379. Altmann, T., Sierau, S., & Roth, M. (2013). I guess you’re just not my type. Journal of Individual Differences, 34, 105–117. Anderson, C., Keltner, D., & John, O. P. (2003). Emotional convergence between people over time. Journal of Personality and Social Psychology, 84, 1054–1068. Beer, A., Watson, D., & McDade-Montez, E. (2013). Self-other agreement and assumed similarity in neuroticism, extraversion, and trait affect: Distinguishing the effects of form and content. Assessment, 20, 723–737. Botwin, M. D., Buss, D. M., & Shackelford, T. K. (1997). Personality and mate preferences: Five factors in mate selection and marital satisfaction. Journal of Personality, 65, 107–136. Buss, D. M. (1984a). Marital assortment for personality dispositions: Assessment with three different data sources. Behavior Genetics, 14, 111–123. Buss, D. M. (1984b). Toward a psychology of person-environment (PE) correlation: The role of spouse selection. Journal of Personality and Social Psychology, 47, 361–377. Buss, D. M., & Craik, K. H. (1983). The act frequency approach to personality. Psychological Review, 90, 105–126. Caspi, A., Herbener, E. S. S., & Ozer, D. J. J. (1992). Shared experiences and the similarity of personalities: A longitudinal study of married couples. Journal of Personality and Social Psychology, 62, 281–291. Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual (Vol. 4). Odessa, FL: Psychological Assessment Resources. Credé, M., Bashshur, M., & Niehorster, S. (2010). Reference group effects in the measurement of personality and attitudes. Journal of Personality Assessment, 92, 390–399. 283 Davis, J. L., & Rusbult, C. E. (2001). Attitude alignment in close relationships. Journal of Personality and Social Psychology, 81, 65–84. Donnellan, M. B., Conger, R. D., & Bryant, C. M. (2004). The Big Five and enduring marriages. Journal of Research in Personality, 38, 481–504. Donnellan, M. B., & Lucas, R. E. (2008). Age differences in the Big Five across the life span: Evidence from two national samples. Psychology and Aging, 23, 558–566. Eysenck, H. J. (1990). Genetic and environmental contributions to individual differences: The three major dimensions of personality. Journal of Personality, 58, 245–261. Farnadi, G., Sushmita, S., Sitaraman, G., Ton, N., De Cock, M., & Davalos, S. (2014). A multivariate regression approach to personality impression recognition of vloggers. Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition (pp. 1–6). Retrieved from http:// dl.acm.org/citation.cfm?id=2659526&dl=ACM&coll=DL&CFID= 863560964&CFTOKEN=93316611 Funder, D. C., Kolar, D. C., & Blackman, M. C. (1995). Agreement among judges of personality: Interpersonal relations, similarity, and acquaintanceship. Journal of Personality and Social Psychology, 69, 656–672. Gaunt, R. (2006). Couple similarity and marital satisfaction: Are similar spouses happier? Journal of Personality, 74, 1401– 1420. Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G. (2006). The international personality item pool and the future of publicdomain personality measures. Journal of Research in Personality, 40, 84–96. Heine, S. J., Buchtel, E. E., & Norenzayan, A. (2008). What do cross-national comparisons of personality traits tell us? The case of conscientiousness. Psychological Science, 19, 309–313. Heine, S. J., Lehman, D. R., Peng, K. P., & Greenholtz, J. (2002). What’s wrong with cross-cultural comparisons of subjective Likert scales?: The reference-group effect. Journal of Personality and Social Psychology, 82, 903–918. Hirsh, J. B., & Peterson, J. B. (2009). Personality and language use in self-narratives. Journal of Research in Personality, 43, 524–527. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67. Kalmijn, M. (2005). Attitude alignment in marriage and cohabitation: The case of sex-role attitudes. Personal Relationships, 12, 521–535. Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, USA, 110, 5802–5805. Lee, K., Ashton, M. C., Pozzebon, J. A., Visser, B. A., Bourdage, J. S., & Ogunfowora, B. (2009). Similarity and assumed similarity in personality reports of well-acquainted persons. Journal of Personality and Social Psychology, 96, 460–472. Martinsson, P. G., Rokhlin, V., & Tygert, M. (2011). A randomized algorithm for the decomposition of matrices. Applied and Computational Harmonic Analysis, 30, 47–68. 284 McCrae, R. R., Martin, T. A., Hrebícková, M., Urbánek, T., Boomsma, D. I., Willemsen, G., & Costa, P. T., Jr. (2008). Personality trait similarity between spouses in four cultures. Journal of Personality, 76, 1137–1164. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90, 862–877. Ozer, D. J., & Benet-Martínez, V. (2006). Personality and the prediction of consequential outcomes. Annual Review of Psychology, 57, 401–421. Park, G. J., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., . . . Seligman, M. E. P. (2014). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108, 934–952. Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903. Ramírez-Esparza, N., Mehl, M. R., Álvarez-Bermúdez, J., & Pennebaker, J. W. (2009). Are Mexicans more or less sociable than Americans? Insights from a naturalistic observation study. Journal of Research in Personality, 43, 1–7. Rushton, J. P., & Bons, T. A. (2005). Mate choice and friendship in twins: Evidence for genetic similarity. Psychological Science, 16, 555–559. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., . . . Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9), Article e73791. doi:10.1371/journal.pone.0073791 Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 25, pp. 1–65). San Diego, CA: Academic Press. Soto, C. J., John, O. P., Gosling, S. D., & Potter, J. (2011). Age differences in personality traits from 10 to 65: Big Youyou et al. Five domains and facets in a large cross-sectional sample. Journal of Personality and Social Psychology, 100, 330–348. Sumner, C., Byers, A., Boochever, R., & Park, G. J. (2012). Predicting Dark Triad personality traits from Twitter usage and a linguistic analysis of tweets. In D. Tao, M. A. Wani, T. Khoshgoftaar, X. Zhu, & N. Seliya (Eds.), Proceedings: 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012 (Vol. 2, pp. 386–393). Los Alamitos, CA: IEEE. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 24–54. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B: Statistical Methodology, 58, 267–288. Watson, D., Beer, A., & McDade-Montez, E. (2014). The role of active assortment in spousal similarity. Journal of Personality, 82, 116–129. Watson, D., Hubbard, B., & Wiese, D. (2000a). General traits of personality and affectivity as predictors of satisfaction in intimate relationships: Evidence from self- and partnerratings. Journal of Personality, 68, 413–449. Watson, D., Hubbard, B., & Wiese, D. (2000b). Self-other agreement in personality and affectivity: The role of acquaintanceship, trait visibility, and assumed similarity. Journal of Personality and Social Psychology, 78, 546–558. Watson, D., Klohnen, E. C., Casillas, A., Simms, E. N., Haig, J., & Berry, D. S. (2004). Match makers and deal breakers: Analyses of assortative mating in newlywed couples. Journal of Personality, 72, 1029–1068. Wood, A. M., Brown, G. D. A., Maltby, J., & Watkinson, P. (2012). How are personality judgments made? A cognitive model of reference group effects, personality scale responses, and behavioral reactions. Journal of Personality, 80, 1275–1311. Youyou, W., Kosinski, M., & Stillwell, D. J. (2015). Computerbased personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, USA, 112, 112–116. QDAFI method TLDR Why is such a method necessary? Rationale: Memory drift will – over time – lead you to remember a highly simplified but grossly distorted caricature of the paper unless we get ahead of this process and distill the pithy, yet relevant essence (PYRE) of the paper up front. An added benefit of this process is that it will allow you to also understand the paper better right now. We can exploit the canonical structure of papers to facilitate active and profitable reading. This structure is mirrored in the QDAFI method, laid out as such: Q: Each paper starts with a question that the authors set out to answer. State this question explicitly, in your own words, so we can gauge understanding. D: What did the authors do to answer the question? This should be on the level of what did they measure (y/DV) as a function of what (they) varied (x/IV), not more detailed. If they did a lot of stuff, what was the most important such x/y or IV/DV pairing regarding the question? Much of what is reported in papers are controls, internal replications or tangents that were obviously requested by reviewers. A: The rationale links the two – what was varied and what was measured. This is usually the trickiest part because it is often not obvious to students scientists would do such things. For the dress paper (2017), the rationale is that if the lighting of the image is ambiguous so assumptions become important and if people assume lighting that they have experienced more often, then we should look for an independent variable that would alter the proportion of long wavelength exposure. Chronotype seems a good candidate, as that is stable in adult life, and everything else being equal, we can assume that owls are exposed to more artificial – long wavelength – light than larks, so we would expect/predict them to see the dress as black/blue. F: Literally, what is the big finding, that was set up by D and A? I: How did the authors answer the original question, given these findings? Sometimes, there are issues like confounds (not just limitations, which all studies have) that seriously challenge the interpretation by the authors, yielding a deviating – your interpretation. This is – by the way – why explicitly stating the rationale is so important. There, the logical chain of the study will become apparent. If some of the assumptions don’t hold, or other causal links haven’t been conclusively ruled out, other interpretations become possible, if not more likely. Honestly, whereas the full exposition of the methods states this should fit on a single page, with 1-3 sentences per item, all of this can – and needs to be, for skilled operators – fit on less than half of a page, here is how: Q, D and F are literally a single, well-crafted sentence each. A can be a sentence, but is usually two: One for assumptions and one for predictions I is a single sentence if the interpretation is legit, and two if it is not. The second sentence is for the statement of the issue (i.e. “SES is fundamentally confounded with trust”) and the alternative interpretation (i.e. “thus, we can’t rule out that we are simply measuring trust, not willpower”). Parting words: None of this will be easy, if you have never done it before. But if it is hard for you, that very fact demonstrates the need for you to learn the method. Unlike newspaper articles, the meaning of most papers is not immediately obvious; you have to work for it. Happily, skills can be learned by deliberate practice. It will be well worth it in the long run. 152 9 How to Read a Scientific Article The QDAFI Method of Structured Relevant Gist Pascal Wallisch Most academic fields document and communicate the knowledge gained from research in the form of articles published in professional journals. These publications are full of the latest information as to where the field is at the current moment and where it is likely to go next. They also constitute a historical record of how the empirical edifice of any given field came to be. Thus being able to gainfully read original research papers is an indispensable skill. Unless we are able to do that, we will have to rely on a ­summary by someone else, who might well have a limited understanding of the research and its implications, or have interests misaligned with those of the researchers, or a combination of both. Being able to access this information ourselves is important. But there are roadblocks. Lots of them. Expert and Non‐Expert Readers Written by experts, for experts, scientific articles tend to be full of specialized jargon while also leaving many important things unsaid, as the authors can rely on the shared assumptions and tacit knowledge of the intended audience. However, there are times when non‐experts want or need to read Critical Reading Across the Curriculum, Volume 2: Social and Natural Sciences, First Edition. Edited by Anton Borst and Robert DiYanni. © 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc. How to Read a Scientific Article a research paper, as in the case of students, journalists, or simply interested laypeople. This presents a problem for these readers, as their lack of common ground will make much of what is in the paper seem like gibberish (Clark and Brennan, 1991). Even worse – especially for students – is that because we cannot remember what we cannot understand (Bartlett, 1932), even after heroically struggling through a paper, we will retain very little of this hard‐won information long term. Moreover, authors attempt to convey to other researchers exactly what took place in the study – in principle one should be able to replicate the research solely from the information provided in the article itself, something that is an increasingly critical consideration (OSC, 2015). Consequently, research papers tend to be chock full of technical details, not all of which are equally important to the overall point of a paper. There is probably no correlation between how easy something is to understand in a paper and how important it is. Keeping in mind that people tend to remember what they are able to understand, and that non‐experts won’t necessarily know what is important and what is not, some readers might remember particulars that are easy to understand, like that there were 57 participants in the second experiment, but which are of negligible significance. This issue is exacerbated by the fact that the retention of minutiae likely comes at the expense of more relevant information. People generally tend not to retain an accurate and comprehensive memory of the information actually presented. Rather, our long‐term memory performs a kind of compression operation, a compression that happens in a semantic space: we retain whatever meaning we are able to extract, but not much else. Known as memory for “gist,” what is remembered long term is often surprisingly sparse and a caricature of the original information (Reyna and Brainerd, 1995). In other words, most of the information encountered is discarded, including most – if not all – details of how it was presented, such as syntax, particular phrasing, font types, and the like. These concerns illustrate how reading a research paper poses a formidable challenge to non‐experts such as students or journalists. The first step in meeting and overcoming this challenge is to acknowledge its tremendous magnitude. Unless someone is a scientist in the same field as the author, one should not expect to be able to read a research paper as effortlessly as something written for a general audience, such as a piece in a newspaper or magazine. Non‐expert readers should not expect to be able to get anything useful out of reading research papers willy‐nilly. The good news is that being able to read a research paper is a skill, and skills can be acquired. For this purpose, I have developed the “QDAFI” method. The QDAFI method enables non‐expert readers to gainfully read research papers with a good deal of understanding. It was designed to exploit the typical structure of 153 154 Pascal Wallisch research papers, as well as to leverage the cognitive aspects of reading research papers, some of which were outlined above. For instance, if we know that readers have a tendency to retain only the gist of some passage, we can anticipate this and get ahead of it by trying to make sure that the gist that is being retained is the relevant gist – what matters most about the article in question from the perspective of someone trying to remember it in the future. There are other considerations as well. Research in cognitive psychology suggests that the format of information, which shapes how that information is encoded, is of critical importance for retrieval (Craik and Tulving, 1975; Roediger and Karpicke, 2006). As we usually want to retrieve information from a paper in the form of a targeted question (such as “what did the study find?”), it makes sense to encode the information in the form of answers to the handful of questions most likely to be asked. Every research paper, regardless of field or content, is more or less structured in a way conducive to this approach. The QDAFI method amounts to a mining operation: it will require a bit of work, but we’re essentially trying to extract nuggets of relevant information from the paper by digging for them in places where we know that the authors put them, sometimes buried under jargon or masses of extraneous information that is of interest only to experts. The QDAFI method yields, deliberately, only gist, and gist of the kind that we’ll need if we are to remember what matters and discuss the paper in the future – in other words, structured and relevant gist. The method consists of answering five questions that are the same for every paper – and in my view the most important to ask of any paper. These questions structure the relevant gist and tend to correspond to the structure of the article itself, which provides an important guide for the non‐expert reader. The answers must be brief, between one (ideally) and three (maximally) sentences each, together fitting on a single page (without fiddling with the margins or fonts). This constraint necessarily enforces the discipline needed to focus on the most relevant information: the gist, which is all that readers remember anyway. Putting in the necessary work to condense our understanding of a paper to a few sentences does wonders for our grasp of its purpose and scientific relevance. The QDAFI Method: An Overview Specifically, the QDAFI is a brief summary that consists of the answers to the following five questions: Q: What was the question that the authors tried to answer? D: What did the authors do to answer the question? How to Read a Scientific Article A: What was the authors’ rationale? F: What did the authors find? I: What is the authors’ interpretation of these findings with regard to the initial question? Let us now look at each item of the QDAFI (and how to address it) in detail. Q: What question did the authors try to answer in this paper? Every good research paper starts with a question it is trying to answer. It is worth restating this question explicitly and as succinctly as possible in the first section of the QDAFI, as without it none of the other sections will make any sense. Whatever the authors did, they presumably did it to answer this question. The Q portion of the QDAFI method can usually be addressed by looking at the Introduction section of papers. If we are lucky, the authors state their question outright; if not, and much more commonly, we will have to read between the lines to reconstruct what we think the authors had in mind. One complication is that there might actually be two questions explicitly or implicitly stated in the paper. The one question that authors will always state explicitly is the “specific research question” the study was designed and executed to answer. This question could be anything – for example, “What is the influence of changing font color on memory retention?” The second question, and the one that usually pervades the spirit of the introduction, is the “theoretical question.” This question is typically the reason why the study was done in the first place. A study is typically not done arbitrarily, but touches on some larger theoretical issue or ­controversy in the field – for example, “What is the influence of context on memory?” So section Q should ideally consist of one sentence stating the specific research question, or two sentences if we manage to spot the theoretical question as well. No more. Consider (for now, not when doing the QDAFI) all the things that are typically in an introduction, but won’t make it into the QDAFI because they are irrelevant for that purpose. For instance, introductions often spend a lot of effort on describing why the question is important in order to make it more likely for the paper to be published in a prestigious journal. But for our purposes, this is irrelevant. Presumably readers are interested in a particular research paper and know why it is important to them. Also, introductions tend to dwell on all the other things that have already been done to answer the theoretical question, 155 156 Pascal Wallisch highlighting why the authors think more research needed to be done (i.e. their paper). None of this academic self‐justification is necessary for purposes of the QDAFI. Realizing this can be liberating, as non‐experts in particular can be intimidated by the wall‐to‐wall citations common in many introductions, none of which the non‐expert has read or needs to be concerned with. D: What did the authors do to answer the question as reported in this paper? In section D of the QDAFI, we want to state as succinctly as possible what the authors actually did in order to answer their research question. This information can be found in the Method section of the paper. However, finding it is tricky, as this section tends, along with Results, to be the densest of the entire paper, full of technical details of interest to experts. It is very easy to get lost in irrelevant minutiae here. How are we supposed to summarize all of this information in a single sentence, or at most two ­sentences? It is not easy, but it is doable. For studies that report experiments, one sentence can present the parameters that were varied (the independent variables), and the other what was measured (the dependent variables). If the study is observational, we can devote one sentence to what was measured and one to the conditions under which the measurements were performed. Unless a detail is absolutely critical to the research question and the interpretation of results, and unless that detail makes a difference for remembering the study in a couple of years, it does not belong under D. We need to be extremely stingy in terms of what is included here. If the paper reports on multiple experiments, a third sentence can be used to outline how the conditions and measurements changed in each (e.g. “they controlled for potential confounding variables x, y and z”). We need to be only as specific as capturing the essence of the experiment requires. For instance, when reporting on what was varied, we may need only record the independent variables – e.g., “they varied luminance levels.” But if a greater level of detail is the point of the experiment, because the authors were the first to achieve, for example, the specific luminance levels of “5, 10, 15, and 20 foot‐lamberts,” then we would note that information. The goal is to not get lost in the details, yet at the same time not leave out anything crucial. Unless it is critical to the question that the study asked, details such as the precise numbers of participants, their gender, etc., do not belong here. Again, as in section Q, recognizing this distinction between what is essential and what is not can be a great relief. How to Read a Scientific Article A: What was the authors’ rationale? This section is perhaps the trickiest of all, yet it is at the heart of the QDAFI method. Stating the rationale necessitates that readers understand what the authors were trying to do and paraphrase it in their own words. If we can state the rationale, we probably have a good grasp of the paper as a whole. While the other sections Q, D, F, and I can sometimes be completed by more or less copying from the paper, A often requires inferring an unstated purpose. So the rationale is crucial, indeed. It answers central questions to the meaning of the authors’ work, such as “What was the idea behind the study?” and “What is its logic?” “What allows the authors to infer anything useful about the question from what they did?” And: “Why did the authors do what they did in D and not something else, out of the infinite number of things they could have done to answer their research question?” The rationale or idea links the specific research question to the actual methods the authors used. Done properly, the rationale sets up what the authors can conclude from any possible outcome, given their methods and question. Without a valid rationale, nothing can be concluded from any given result. Contrary to popular belief, the data do not speak for themselves. Rationales are rarely spelled out explicitly, as either the experts in the intended audience will grasp them intuitively, or a given rationale might be part of the shared culture of a field. Rationales can be tricky to track down, but the information necessary to make an informed guess about them is usually distributed in the Introduction and Method sections. For instance, the authors of a study might have a theory that more intelligent people have faster neurotransmission, which they discuss in their introduction. If this were the case, they could predict that such people are quicker to respond to a stimulus, and so the authors have designed an experiment to test the relationship between intelligence and response‐time to a stimulus, which they explain in their Method section. Our A section of the QDAFI might therefore read: It is believed that intelligent people have faster neurotransmission. Faster neurotransmission would be evident in faster response‐times to stimulus. With this rationale, we can now meaningfully interpret the results of our experiments: if a significant number of intelligent people are slower, this falsifies the theory, whereas if intelligent people were found to be consistently faster, it would provide empirical support in favor of the theory. Without a rationale, not much can be concluded from measuring reaction times and intelligence – it might even seem downright arbitrary to do so. The stronger the rationale, the more conclusive the results will be, whether for or against the 157 158 Pascal Wallisch theoretical issue at stake. Depending on the complexity of the rationale of the study, this section might require anywhere from one to three sentences, but not more. F: What did the authors find? This is perhaps the most straightforward section of the entire QDAFI. What the authors found as the results of their study is usually reported in the Results section of the paper, and quite explicitly so. The biggest challenge here is to identify the key findings relevant to the question the authors asked initially and to understand them. Those findings are usually expressed in extremely technical language, often involving statistical notation. Again, the discipline imposed on us by the QDAFI method comes to the rescue. It forces us to identify the core of the matter. Which of the many results reported in a paper (the average paper in psychology contains no less than 11 p‐values [Nuijten et al., 2015]) are actually relevant to the original question? Some authors like to build up to their main result; others hedge it with many side‐considerations. Both approaches may leave readers with lots of information to sift through before identifying the most important findings. Additionally, peer‐reviewers who vet papers before publication often request additional analyses of personal interest to them, which authors may then obligingly incorporate in order to ensure their work is published. This adds even more extraneous information. Readers must find the key finding – the finding that matters in terms of the original question. Having completed A, the rationale, will now help us determine which finding or pattern of findings to look for in the Results section. Usually, this key ­finding can be stated in a single sentence. Sometimes, if the findings are particularly surprising or complicated, an additional sentence will be needed to elaborate. Never are more than three sentences necessary. Given the plethora of statistics typically reported by authors, such brevity can only be accomplished by focusing on the big picture – what matters in terms of the question, as spelled out by the rationale. The key is to focus on the facts. What did the authors actually find? This should be stated as dispassionately as possible, without mentioning anything about what these findings might mean. That is for the next, and last, section. I: What is the interpretation of these findings with regard to the initial question? Put simply, what do the authors think their findings mean? Answering this question is usually straightforward, as the authors will tell you what they How to Read a Scientific Article think in the Discussion section of the paper. But that’s not all there is to the QDAFI’s “I,” which consists of one to three sentences. The first sentence records what the authors think the findings mean in terms of the original question. The second sentence identifies an issue that threatens this interpretation. The third states what we, the readers, think these findings mean. Thus there are potentially three I’s to be addressed in this section: the interpretation by the authors, the issues, and the interpretation by the reader. Suppose that some authors want to know whether the ability to delay gratification in childhood is a major predictor of life success (Mischel et al., 1989), and imagine that they find that it is. They then conclude that this finding suggests that the ability to forego small but immediate rewards for large but delayed ones is the bedrock of individual success, as well as social functioning. If we agree with this interpretation, we are done with this section in one sentence. However, it is possible that upon examination the authors’ conclusions appear problematic. For instance, they might neglect to measure the level of trust of the experimental participants, trust that the delayed reward will in fact materialize, as the experimenter promises. It has been shown that children from lower socio‐economic‐status (SES) backgrounds have lower amounts of trust in promised future outcomes. In what for them is an unreliable world, going for the safe, immediate outcome might be the most rational strategy. In addition, low childhood SES is linked to poor adult life outcomes. In other words, a major potential confound has not been accounted for by the authors. While interpreted as the ability to delay gratification, what might really be being measured is trust level as a proxy for SES, which is already known to be linked to adult life outcomes. Thus, having identified this problem with the authors’ interpretation of their findings, we may now offer our own: unless ruled out in a future study, low childhood SES is linked to poor adult outcomes, likely ­mediated by trust. So far, so good. The key problem for this section is to identify an issue, the second “I.” To be worth identifying and recording on the QDAFI, an issue must be so serious that it fundamentally threatens the authors’ interpretation of their experiment’s outcome – not some minor technical limitation. It should be understood that every research paper suffers from a large number of problems. This is necessarily the case, as we live in a non‐ideal world. Life is complicated. Things don’t always work out. However, the issues that the QDAFI method leads us to focus on represent only problems that are true showstoppers. A real issue in this sense is usually something like faulty logic (e.g. the authors interpret a correlational study causally, 159 160 Pascal Wallisch when they should have done an experiment); a potential confounding ­variable like the one we outlined above; or a serious technical problem (e.g. the method the authors used does not have the sensitivity to reliably detect the results the authors claim they found). It is not something trivial like the number of participants, as we always would like to have more research ­participants (unless the study is seriously underpowered [Wallisch, 2015]), or the size of the monitor on which the stimuli were presented (unless it was really too small for the participants to reliably see the stimuli), or other minor technical issues. All studies have limitations, but not necessarily issues that threaten the conclusions (I) authors draw from their findings (F) with respect to their question (Q). It might not be reasonable or feasible for non‐experts to find such fundamental flaws in a study. Instead of manufacturing issues, non‐ expert readers are advised to stick with the interpretation of the authors, and to succinctly state it in one sentence with a focus on the big‐ticket item: what do these results mean with respect to the empirical question at hand? There are often plenty of considerations raised by any given study, but it is best to focus on the biggest one, the one described in section Q. And that’s it – that’s the QDAFI in all its purposeful simplicity. Done right, it will easily fit on a single page and be memorable for years to come. In this case, less is definitely more. Each of the five sections of the QDAFI should be tweetable (using Twitter’s original limit of 140 characters), ­ideally, but this is not a strict requirement. Benefits of the QDAFI Method Once completed a few times, the QDAFIs will reveal their usefulness. Being able to do a QDAFI is a beneficial skill for mainly three reasons. First, research papers are not written in a fashion that allows them to be read like any other writing. This is a problem, as they look very much like other written works, but treating them as such won’t provide much value in the reading experience. Second, long‐term memory is radically semantically reductive over time. Short‐term memory can be deceptive because of this. After just having carefully read a paper, all the details will be readily available to our minds (assuming we have understood it). But this memory won’t last. Most, if not all, of the details will be forgotten over the course of a few days, weeks, and months. The only thing remembered a couple of years later is what most caught our attention when we read it. This is a problem, especially if our brain at the time thought that the most salient or remarkable thing about the paper is that all the authors’ last names started with a “G.” How to Read a Scientific Article Expertise in any field consists of building up a database of highly semantic knowledge over time. This is, by the way, not a downside of memory, but rather a benefit: The brain needs to make sure that it is not getting too cluttered with irrelevant information. The QDAFI anticipates this housecleaning business and makes sure that the most important points about the paper are also the ones focused on when “reading” the paper. And that is why it is critical that the QDAFI be so short – the shorter the better. Third, being able to complete a QDAFI has benefits beyond reading and remembering individual papers, providing tools for conducting research, synthesizing disciplinary knowledge, and producing our own research articles. The method offers a very efficient technique for developing literature reviews and summaries, which are essential for writing review papers and introductions to research. Because the QDAFI maximizes information transmission with the minimum amount of text in a clear and cogent way, it provides an effective template for structuring abstracts of original research. Finally, it can begin to inform how we write our own studies and papers, to make them more memorable to readers who may be applying, consciously or not, their own version of QDAFI to our work. A Demonstration So far, our discussion of the QDAFI method has been fairly abstract (though with occasional examples). Here I present sample solutions to two papers that are publicly and freely available online and that are of general interest. The first paper is technically straightforward, and investigates whether taking notes by hand versus taking them on a laptop matters for retention of the presented material. The second paper is much more technical, and explores whether individuals with autism exhibit any perceptual benefits. Thus, the choice of these papers illustrates the versatility of the QDAFI method, which works effectively regardless of a paper’s specific content or technical sophistication. If you are interested in how the relevant information in long and complicated papers can be extracted and end up in QDAFI form, have a look at the original papers and then the sample QDAFIs below. Paper 1: Mueller, P.A., and Oppenheimer, D.M. (2014). The pen is mightier than the keyboard: Advantages of longhand over laptop note taking. Psychological Science, 25 (6), 1159–1168. Q: Does taking class notes by hand yield better academic performance than taking them on a laptop? 161 162 Pascal Wallisch D: Participants were asked to either take notes on a laptop or by hand while watching TED talks. They were then asked to answer both factual and conceptual questions about the material presented in the talks. A: It is believed that using laptops to take notes yields poor performance because of distractions. However, even without distractions, if laptop use leads to shallower processing, it could detrimentally impact performance on conceptual questions. F: Students who typed notes on a computer performed worse on conceptual but not factual questions compared to those who wrote them by hand. I: Taking notes on a laptop negatively affects performance in response to conceptual questions compared to taking notes by hand, which could be due to the fact that taking notes on a laptop seems to encourage shallow processing, such as copying the material verbatim. Paper 2: Foss‐Feig, J.H., Tadin, D., Schauder, K.B., and Cascio, C.J. (2013). A substantial and unexpected enhancement of motion perception in autism. Journal of Neuroscience, 33 (19), 8243–8249. Q: Do autistic individuals exhibit enhanced motion perception? Are there reliable psychophysical markers of autism that could transcend subjective diagnostics? D: Groups of autistic children and controls were shown moving gratings varying in size and contrast. The authors measured how long the stimuli had to be shown until perceived accurately by the study participants. A: It is believed that autistic individuals might suffer from a deficit in inhibitory neurotransmitters. If this is the case, they should show less spatial suppression, which is believed to rely on inhibition, and therefore be quicker to see large stimuli. F: Autistic individuals are more sensitive to perceive motion for all stimulus sizes, but they are not better at low contrasts. How to Read a Scientific Article I: It is not the case that autistic individuals show less spatial suppression – rather, the low contrast results suggest that there is abnormal gain control in autism. The authors neither measure nor report reaction times; as autistic individuals are known to exhibit longer reaction times, it cannot be ruled out that the results are simply due to a speed/accuracy tradeoff. This does not mean that duration thresholds cannot be used as a psychophysical marker of autism, but it might not be for the reasons the authors think. (For a more detailed discussion of this issue, see Wallisch and Bornstein, 2013.) Conclusion For students, doing a QDAFI is hard – really hard, especially in the beginning. But they should not be discouraged. The best way for students to learn this method is simply to do it. They will improve with practice, which is well worth the effort. If it isn’t already clear from the foregoing discussion, rest assured that reports so far suggest that learning how to write a QDAFI is an extremely valuable skill. Students in classes where I taught the QDAFI have emailed me, sometimes years later, to say this was indeed the most valuable thing they learned. I am currently trying to figure out how to ethically study the effectiveness of the QDAFI empirically, which is tricky. As we know how helpful the QDAFI is, it would be unethical to teach one group of students how to do this, but not another, and then test their retention of articles at the end of the class. Conversely, if we required groups of students to do QDAFIs versus free‐form summaries of papers in alternating weeks, we would have to teach the QDAFI to both groups, and could not rule out that the free‐form group also implicitly applied the QDAFI method in any given week. Thus the comparison would perhaps only be valuable for the very first week, before the free‐form group was taught how to do the QDAFI. But that seems like a lot of work for little useful data. Regardless, I believe that it works, due to the principles laid out above, and because of my experience using the QDAFI myself, as well as my ­experience teaching it to my students. I hope you will find this method useful as well. Acknowledgments This method was originally inspired by Stephen Kosslyn’s “QuALMRI” technique. 163 164 Pascal Wallisch References Bartlett, F.C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University. Clark, H. H., and Brennan, S. E. (1991). Grounding in communication. In Perspectives on socially shared cognition. Ed. L.B. Resnick, J.M. Levine, and D.T. Teasley. American Psychological Association. Craik, F.I., and Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104 (3), 268–294. Foss‐Feig, J.H., Tadin, D., Schauder, K.B., and Cascio, C.J. (2013). A substantial and unexpected enhancement of motion perception in autism. Journal of Neuroscience, 33 (19), 8243–8249. Mischel, W., Shoda, Y., and Rodriguez, M. L. (1989, May 26). Delay of gratification in children. Science, 244 (4907), 933–938. Mueller, P.A., and Oppenheimer, D.M. (2014). The pen is mightier than the keyboard: Advantages of longhand over laptop note taking. Psychological Science, 25 (6), 1159–1168. Nuijten, M.B., Hartgerink, C.H., Assen, M.A., Epskamp, S., and Wicherts, J.M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48 (4). Open Science Collaboration (OSC). (2015, August 28). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716. Reyna, V.F., and Brainerd, C.J. (1995). Fuzzy‐trace theory: An interim synthesis. Learning and Individual Differences, 7 (1), 1–75. Roediger, H.L., and Karpicke, J.D. (2006). Test‐enhanced learning taking memory tests improves long‐term retention. Psychological Science, 17 (3), 249–255. Wallisch, P., and Bornstein, A.M. (2013). Enhanced motion perception as a psychophysical marker for autism? Journal of Neuroscience, 33 (37), 14631–14632. Wallisch, P. (2015). Brighter than the sun: Powerscape visualizations illustrate power needs in neuroscience and psychology. arXiv preprint arXiv:1512.09368

A Sample Answer For the Assignment: IVs and DVs

Title:  IVs and DVs