Assignment: Descriptive Inferential Statistics
Calculating Descriptive Statistics
There are two major classes of statistics: descriptive statistics and inferential statistics. Descriptive statistics are computed to reveal characteristics of the sample data set and to describe study variables. Inferential statistics are computed to gain information about effects and associations in the population being studied. For some types of studies, descriptive statistics will be the only approach to analysis of the data. For other studies, descriptive statistics are the ﬁrst step in the data analysis process, to be followed by inferential statistics.
For all studies that involve numerical data, descriptive statistics are crucial in understanding the fundamental properties of the variables being studied. Exercise 27 focuses only on descriptive statistics and will illustrate the most common descriptive statistics computed in nursing research and provide examples using actual clinical data from empirical publications.
Descriptive Inferential Statistics Assignment
MEASURES OF CENTRAL TENDENCY
A measure of central tendency is a statistic that represents the center or middle of a frequency distribution. The three measures of central tendency commonly used in nursing research are the mode, median ( MD ), and mean ( X ). The mean is the arithmetic average of all of a variable ’ s values. The median is the exact middle value (or the average of the middle two values if there is an even number of observations). The mode is the most commonly occurring value or values (see Exercise 8 ). The following data have been collected from veterans with rheumatoid arthritis ( Tran, Hooker, Cipher, & Reimold, 2009 ).
The values in Table 27-1 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., inﬂ iximab [Remi-cade], etanercept [Enbrel]). Table 27-1 contains data collected from 10 veterans who had stopped taking biologic medications, and the variable represents the number of years that each veteran had taken the medication before stopping. Because the number of study subjects represented below is 10, the correct statistical notation to reﬂect that number is: n=10 Note that the n is lowercase, because we are referring to a sample of veterans. If the data being presented represented the entire population of veterans, the correct notation is the uppercase N.
Because most nursing research is conducted using samples, not populations, all formulas in the subsequent exercises will incorporate the sample notation, n. Mode The mode is the numerical value or score that occurs with the greatest frequency; it does not necessarily indicate the center of the data set.
Each of these numbers occurred twice in the data set. When two modes exist, the data set is referred to as bimodal; a data set that contains more than two modes would be multimodal. Median The median ( MD ) is the score at the exact center of the ungrouped frequency distribution. It is the 50th percentile. To obtain the MD , sort the values from lowest to highest. If the number of values is an uneven number, exactly 50% of the values are above the MD and 50% are below it. If the number of values is an even number, the MD is the average of the two middle values.
Thus the MD may not be an actual value in the data set. For example, the data in Table 27-1 consist of 10 observations, and therefore the MD is calculated as the average of the two middle values. MD=+()=15202175… Mean The most commonly reported measure of central tendency is the mean. The mean is the sum of the scores divided by the number of scores being summed. Thus like the MD, the mean may not be a member of the data set.
The formula for calculating the mean is as follows: XXn=∑ where X = mean ∑ = sigma, the statistical symbol for summation X = a single value in the sample n = total number of values in the sample The mean number of years that the veterans used a biologic medication is calculated as follows: X=+++++++++()=010313151520223030401019………..years TABLE 27-1 DURATION OF BIOLOGIC USE AMONG VETERANS WITH RHEUMATOID ARTHRITIS ( n = 10) Duration of Biologic Use (years) 0.10.31.31.51.52.02.23.03.04.0 293
The mean is an appropriate measure of central tendency for approximately normally distributed populations with variables measured at the interval or ratio level. It is also appropriate for ordinal level data such as Likert scale values, where higher numbers rep-resent more of the construct being measured and lower numbers represent less of the construct (such as pain levels, patient satisfaction, depression, and health status). The mean is sensitive to extreme scores such as outliers. An outlier is a value in a sample data set that is unusually low or unusually high in the context of the rest of the sample data. An example of an outlier in the data presented in Table 27-1 might be a value such as 11.
The existing values range from 0.1 to 4.0, meaning that no veteran used a biologic beyond 4 years. If an additional veteran were added to the sample and that person used a biologic for 11 years, the mean would be much larger: 2.7 years. Simply adding this outlier to the sample nearly doubled the mean value. The outlier would also change the frequency distribution. Without the outlier, the frequency distribution is approximately normal, as shown in Figure 27-1 . Including the outlier changes the shape of the distribution to appear positively skewed. Although the use of summary statistics has been the traditional approach to describing data or describing the characteristics of the sample before inferential statistical analysis, its ability to clarify the nature of data is limited.
For example, using measures of central tendency, particularly the mean, to describe the nature of the data obscures the impact of extreme values or deviations in the data. Thus, signiﬁcant features in the data may be concealed or misrepresented. Often, anomalous, unexpected, or problematic data and discrepant patterns are evident, but are not regarded as meaningful. Measures of dispersion, such as the range, difference scores, variance, and standard deviation ( SD ), provide important insight into the nature of the data.
Descriptive Inferential Statistics Assignment,
MEASURES OF DISPERSION
Measures of dispersion , or variability, are measures of individual differences of the members of the population and sample. They indicate how values in a sample are dispersed around the mean. These measures provide information about the data that is not available from measures of central tendency. They indicate how different the scores are—the extent to which individual values deviate from one another. If the individual values are similar, measures of variability are small and the sample is relatively homogeneous in terms of those values. Heterogeneity (wide variation in scores) is important in some statistical procedures, such as correlation. Heterogeneity is determined by measures of variability. The measures most commonly used are range, difference scores, variance, and SD (see Exercise 9 ). FIGURE 27-1
■ FREQUENCY DISTRIBUTION OF YEARS OF BIOLOGIC USE, WITHOUT OUTLIER AND WITH OUTLIER. 0FrequencyFrequency3-3.90-0.92-2.91-1.94-4.93-3.90-.91-1.92-2.94-4.95-5.96-6.97-7.98-8.99-9.910-10.911-11.9Years of biologic useYears of biologic use3.02.52.01.51.00.503.02.52.01.51.00.5 294
Range The simplest measure of dispersion is the range . In published studies, range is presented in two ways: (1) the range is the lowest and highest scores, or (2) the range is calculated by subtracting the lowest score from the highest score. The range for the scores in Table 27-1 is 0.3 and 4.0, or it can be calculated as follows: 4.0 − 0.3 = 3.7. In this form, the range is a difference score that uses only the two extreme scores for the comparison.
The range is generally reported but is not used in further analyses. Difference Scores Difference scores are obtained by subtracting the mean from each score. Sometimes a difference score is referred to as a deviation score because it indicates the extent to which a score deviates from the mean. Of course, most variables in nursing research are not “scores,” yet the term difference score is used to represent a value ’ s deviation from the mean.
The difference score is positive when the score is above the mean, and it is negative when the score is below the mean (see Table 27-2 ). Difference scores are the basis for many statistical analyses and can be found within many statistical equations. The formula for difference scores is: XX− Σof absolute values95:. TABLE 27-2 DIFFERENCE SCORES OF DURATION OF BIOLOGIC USE X –X XX– 0.1 − 1.9 − 1.80.3 − 1.9 − 1.61.3 − 1.9 − 0.61.5 − 1.9 − 0.41.5 − 1.9 − 0.42.0 − 220.127.116.11 − 18.104.22.168 − 22.214.171.124 − 126.96.36.199 − 1.92.1 The mean deviation is the average difference score, using the absolute values. The formula for the mean deviation is: XXXndeviation=−∑
In this example, the mean deviation is 0.95. This value was calculated by taking the sum of the absolute value of each difference score (1.8, 1.6, 0.6, 0.4, 0.4, 0.1, 0.3, 1.1, 1.1, 2.1) and dividing by 10. The result indicates that, on average, subjects ’ duration of biologic use deviated from the mean by 0.95 years. Variance Variance is another measure commonly used in statistical analysis. The equation for a sample variance ( s 2 ) is below. sXXn221=−()−∑ 295
Note that the lowercase letter s 2 is used to represent a sample variance. The lowercase Greek sigma ( σ 2 ) is used to represent a population variance, in which the denominator is N instead of n − 1. Because most nursing research is conducted using samples, not populations, formulas in the subsequent exercises that contain a variance or standard deviation will incorporate the sample notation, using n − 1 as the denominator.
Moreover, statistical software packages compute the variance and standard deviation using the sample formulas, not the population formulas. The variance is always a positive value and has no upper limit. In general, the larger the variance, the larger the dispersion of scores. The variance is most often computed to derive the standard deviation because, unlike the variance, the standard deviation reﬂects important properties about the frequency distribution of the variable it represents.
Table 27-3 displays how we would compute a variance by hand, using the biologic duration data. s213419=. s²=1.49 TABLE 27-3
VARIANCE COMPUTATION OF BIOLOGIC USE X X XX– XX–(())2 0.1 − 1.9 − 188.8.131.52 − 1.9 − 1.62.561.3 − 1.9 − 0.60.361.5 − 1.9 − 0.40.161.5 − 1.9 − 0.40.162.0 − 1.90.10.012.2 − 1.90.30.093.0 − 184.108.40.206.0 − 220.127.116.11.0 − 18.104.22.168 Σ 13.41
Standard deviation is a measure of dispersion that is the square root of the variance. The standard deviation is represented by the notation s or SD . The equation for obtaining a standard deviation is SDX=−()−∑Xn21 Table 27-3 displays the computations for the variance. To compute the SD , simply take the square root of the variance. We know that the variance of biologic duration is s 2 = 1.49. Therefore, the s of biologic duration is SD = 1.22. The SD is an important statistic, both for understanding dispersion within a distribution and for interpreting the relationship of a particular value to the distribution.
Descriptive Inferential Statistics Assignment,
A standard error describes the extent of sampling error. For example, a standard error of the mean is calculated to determine the magnitude of the variability associated with the mean. A small standard error is an indication that the sample mean is close to the population mean, while a large standard error yields less certainty that the sample mean approximates the population mean. The formula for the standard error of the mean ( sX ) is: ssnX= Using the biologic medication duration data, we know that the standard deviation of biologic duration is s = 1.22.
Therefore, the standard error of the mean for biologic duration is computed as follows: sX=12210. sX=039. The standard error of the mean for biologic duration is 0.39. Conﬁdence Intervals To determine how closely the sample mean approximates the population mean, the standard error of the mean is used to build a conﬁdence interval. For that matter, a conﬁdence interval can be created for many statistics, such as a mean, proportion, and odds ratio. To build a conﬁdence interval around a statistic, you must have the standard error value and the t value to adjust the standard error. Descriptive Inferential Statistics Assignment.
The degrees of freedom ( df ) to use to compute a conﬁdence interval is df = n − 1. To compute the conﬁdence interval for a mean, the lower and upper limits of that interval are created by multiplying the sX by the t statistic, where df = n − 1. For a 95% conﬁdence interval, the t value should be selected at α = 0.05. For a 99% conﬁdence inter-val, the t value should be selected at α = 0.01. Using the biologic medication duration data, we know that the standard error of the mean duration of biologic medication use is sX=039.
The mean duration of biologic medication use is 1.89. Therefore, the 95% conﬁdence interval for the mean duration of biologic medication use is computed as follows: XstX± 189039226…±()() 189088..± As referenced in Appendix A , the t value required for the 95% conﬁdence interval with df = 9 is 2.26. The computation above results in a lower limit of 1.01 and an upper limit of 2.77. This means that our conﬁdence interval of 1.01 to 2.77 estimates the population mean duration of biologic use with 95% conﬁdence ( Kline, 2004 ). Descriptive Inferential Statistics Assignment.
Technically and mathematically, it means that if we computed the mean duration of biologic medication use on an inﬁnite number of veterans, exactly 95% of the intervals would contain the true population mean, and 5% would not contain the population mean ( Gliner, Morgan, & Leech, 2009 ). If we were to compute a 99% conﬁdence interval, we would require the t value that is referenced at α = 0.01. Therefore, the 99% conﬁdence interval for the mean duration of biologic medication use is computed as follows: 189039325…±()() 189127..± 297
As referenced in Appendix A , the t value required for the 99% conﬁdence interval with df = 9 is 3.25. The computation above results in a lower limit of 0.62 and an upper limit of 3.16. This means that our conﬁdence interval of 0.62 to 3.16 estimates the population mean duration of biologic use with 99% conﬁdence. Degrees of Freedom The concept of degrees of freedom ( df ) was used in reference to computing a conﬁdence interval. For any statistical computation, degrees of freedom are the number of independent pieces of information that are free to vary in order to estimate another piece of information ( Zar, 2010 ). In the case of the conﬁdence interval, the degrees of freedom are n − 1. This means that there are n − 1 independent observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the conﬁdence interval.
Descriptive Inferential Statistics Assignment
A retrospective descriptive study examined the duration of biologic use from veterans with rheumatoid arthritis ( Tran et al., 2009 ). The values in Table 27-4 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., inﬂ iximab [Remicade], etanercept [Enbrel]).
Table 27-4 contains simulated demographic data collected from 10 veterans who had stopped taking biologic medications. Age at study enrollment, duration of biologic use, race/ethnicity, gender (F = female), tobacco use (F = former use, C = current use, N = never used), primary diagnosis (3 = irritable bowel syndrome, 4 = psoriatic arthritis, 5 = rheumatoid arthritis, 6 = reactive arthritis), and type of biologic medication used were among the study variables examined. TABLE 27-4 DEMOGRAPHIC VARIABLES OF VETERANS WITH RHEUMATOID ARTHRITIS Patient ID Duration (yrs) Age Race/Ethnicity Gender Tobacco Diagnosis Biologic 10.142 CaucasianFF5 Inﬂ iximab20.341 Black, not of Hispanic OriginFF5Etanercept31.356CaucasianFN5Inﬂ iximab41.578CaucasianFF3Inﬂ iximab51.586Black, not of Hispanic OriginFF4Etanercept62.049CaucasianFF6Etanercept72.282CaucasianFF5Inﬂ iximab83.035CaucasianFN3Inﬂ iximab93.059Black, not of Hispanic OriginFC3Inﬂ iximab104.037CaucasianFF5Etanercept 298
This is how our data set looks in SPSS. Step 1: For a nominal variable, the appropriate descriptive statistics are frequencies and percentages. From the “Analyze” menu, choose “Descriptive Statistics” and “Frequen-cies.” Move “Race/Ethnicity and Gender” over to the right. Click “OK.” Descriptive Inferential Statistics Assignment.
Step 2: For a continuous variable, the appropriate descriptive statistics are means and standard deviations. From the “Analyze” menu, choose “Descriptive Statistics” and “Explore.” Move “Duration” over to the right. Click “OK.” INTERPRETATION OF SPSS OUTPUT The following tables are generated from SPSS. The ﬁ rst set of tables (from the ﬁ rst set of SPSS commands in Step 1) contains the frequencies of race/ethnicity and gender. Most (70%) were Caucasian, and 100% were female. Frequencies Frequency Table RaceEthnicityFrequencyPercentValid PercentCumulative PercentValidBlack, not of Hispanic
Descriptive Inferential Statistics Assignment
Origin330.030.030.0Caucasian770.070.0100.0Total10100.0100.0GenderFrequencyPercentValid PercentCumulative PercentValidF10100.0100.0100.0 300EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. DescriptivesStatisticStd. ErrorDuration of Biologic Use1.890.3860Lower Bound1.017Upper Bound2.7631.8721.7501.4901.2206.14.03.92.0.159.687-.4371.334Mean95%
Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
The second set of output (from the second set of SPSS commands in Step 2) contains the descriptive statistics for “Duration,” including the mean, s (standard deviation), SE , 95% conﬁ dence interval for the mean, median, variance, minimum value, maximum value, range, and skewness and kurtosis statistics. As shown in the output, mean number of years for duration is 1.89, and the SD is 1.22. The 95% CI is 1.02–2.76. Explore 301. Descriptive Inferential Statistics Assignment,.
1. Deﬁ ne mean.
2. What does this symbol, s 2 , represent?
3. Deﬁ ne outlier.
4. Are there any outliers among the values representing duration of biologic use?
5. How would you interpret the 95% conﬁ dence interval for the mean of duration of biologic use?
6. What percentage of patients were Black, not of Hispanic origin?
7. Can you compute the variance for duration of biologic use by using the information presented in the SPSS output above?
Important information for writing discussion questions and participation
Please read through the following information on writing a Discussion question response and participation posts.
Contact me if you have any questions.
Important information on Writing a Discussion Question
- Your response needs to be a minimum of 150 words (not including your list of references)
- There needs to be at least TWO references with ONE being a peer reviewed professional journal article.
- Include in-text citations in your response
- Do not include quotes—instead summarize and paraphrase the information
- Follow APA-7th edition
- Points will be deducted if the above is not followed
Participation –replies to your classmates or instructor
- A minimum of 6 responses per week, on at least 3 days of the week.
- Each response needs at least ONE reference with citations—best if it is a peer reviewed journal article
- Each response needs to be at least 75 words in length (does not include your list of references)
- Responses need to be substantive by bringing information to the discussion or further enhance the discussion. Responses of “I agree” or “great post” does not count for the word count.
- Follow APA 7th edition
- Points will be deducted if the above is not followed
- Remember to use and follow APA-7th edition for all weekly assignments, discussion questions, and participation points.
- Here are some helpful links
- Student paper example
- Citing Sources
- The Writing Center is a great resource
Welcome to class
Hello class and welcome to the class and I will be your instructor for this course. This is a -week course and requires a lot of time commitment, organization, and a high level of dedication. Please use the class syllabus to guide you through all the assignments required for the course. I have also attached the classroom policies to this announcement to know your expectations for this course. Please review this document carefully and ask me any questions if you do. You could email me at any time or send me a message via the “message” icon in halo if you need to contact me. I check my email regularly, so you should get a response within 24 hours. If you have not heard from me within 24 hours and need to contact me urgently, please send a follow up text to.
I strongly encourage that you do not wait until the very last minute to complete your assignments. Your assignments in weeks 4 and 5 require early planning as you would need to present a teaching plan and interview a community health provider. I advise you look at the requirements for these assignments at the beginning of the course and plan accordingly. I have posted the YouTube link that explains all the class assignments in detail. It is required that you watch this 32-minute video as the assignments from week 3 through 5 require that you follow the instructions to the letter to succeed. Failure to complete these assignments according to instructions might lead to a zero. After watching the video, please schedule a one-on-one with me to discuss your topic for your project by the second week of class. Use this link to schedule a 15-minute session. Please, call me at the time of your appointment on my number. Please note that I will NOT call you.
Please, be advised I do NOT accept any assignments by email. If you are having technical issues with uploading an assignment, contact the technical department and inform me of the issue. If you have any issues that would prevent you from getting your assignments to me by the deadline, please inform me to request a possible extension. Note that working fulltime or overtime is no excuse for late assignments. There is a 5%-point deduction for every day your assignment is late. This only applies to approved extensions. Late assignments will not be accepted.
If you think you would be needing accommodations due to any reasons, please contact the appropriate department to request accommodations.
Plagiarism is highly prohibited. Please ensure you are citing your sources correctly using APA 7th edition. All assignments including discussion posts should be formatted in APA with the appropriate spacing, font, margin, and indents. Any papers not well formatted would be returned back to you, hence, I advise you review APA formatting style. I have attached a sample paper in APA format and will also post sample discussion responses in subsequent announcements.
Your initial discussion post should be a minimum of 200 words and response posts should be a minimum of 150 words. Be advised that I grade based on quality and not necessarily the number of words you post. A minimum of TWO references should be used for your initial post. For your response post, you do not need references as personal experiences would count as response posts. If you however cite anything from the literature for your response post, it is required that you cite your reference. You should include a minimum of THREE references for papers in this course. Please note that references should be no more than 5 years old except recommended as a resource for the class. Furthermore, for each discussion board question, you need ONE initial substantive response and TWO substantive responses to either your classmates or your instructor for a total of THREE responses. There are TWO discussion questions each week, hence, you need a total minimum of SIX discussion posts for each week. I usually post a discussion question each week. You could also respond to these as it would count towards your required SIX discussion posts for the week.
I understand this is a lot of information to cover in 5 weeks, however, the Bible says in Philippians 4:13 that we can do all things through Christ that strengthens us. Even in times like this, we are encouraged by God’s word that we have that ability in us to succeed with His strength. I pray that each and every one of you receives strength for this course and life generally as we navigate through this pandemic that is shaking our world today. Relax and enjoy the course!