Introduction
It is no doubt that biostatistics is applicable in all sphere of life, the public healthcare sector not being an exception. Public health has been thought of as the science that encompasses protecting as well as improving the health status of a given group of individuals (Sullivan, 2007). This is usually done through educating them, promoting healthy life styles, carrying out research regarding various diseases as well as how to prevent injuries. It is worth noting that professionals in public health are cap[able of analyzing effects of the environment, hereditary components (genetics), individual preferences and how they affect human health (Burns & Grove, 2003). This in turn helps relevant authorities to come up with plans aimed at protecting the health of a given community; all this is thanks to statistics.
Research has been defined as the systematic, rigorous, logic and objective process of inquiry put into a problem. The main reason of carrying out a research is thus to discuss relevant information, gaining data, impirical evidence, about a given phenomenon of interest (Elston & Johnson, 1994). Therefore it is a sequential process that follows a given set of steps that are well defined.
Although there are various definitions of biostatistics (biometry or biometrics); the on adopted for the sake of the paper is that it refers to the science of applying statistics to solving complex problems that are evident in all areas of biology as well as medicine. As noted previously, biostatistics is applicable in a range of varied fields for instance in public health which entails epidemiology, health service research, nutrition, secondly nit is applicable in designing and analyzing clinical trials in medicine, ecology as well as ecological forecasting, biological sequence analysis to mention but a few (Polit & Beck, 2007).
Despite the fact that all areas of biostatistics are significant, two major sections are of interest and usually dictate the outcome of the research findings; these are sampling and tools as well as methods used in data collection. For that matter, there is need to clearly understand what entails these two section. Numerous studies have been carried out but with wrong sapling procedures, sample size as well as wrong tools and methods of data collection. This resulted to wrong and incomplete outcome that was transferred in making poor decisions that instead of safeguarding the health of the community, it leaves them worse off than before.
Sampling
Sampling refers to a process of selecting a few individual from a given group of person or a population for the purposes of gaining insights on a number of attributes under study as well as coming up with a generalization with regards to the population.. This helps relevant authorities especially in public health to come up with rational decision that are geared towards bettering the health status of the community (Adèr et al., 2008).
As suggested by Sullivan, 2007 the process bring with it numerous advantages such as saving time and other resources, there is room for homogeneity to mention but a few.
For this reason, there is need for one to clearly understand and apply sampling and its components in a succinct manner as this coupled with excellent execution of other research techniques results to desired research outcome thereby helping in making rational decisions.
Population and sample
When carrying out a research, there is no way a researcher can include all individuals he/she is to study but rather to choose some representatives of the same. This is attributed to expensiveness of trying to gain inferences from the entire as well as time consumption. Population refers to all individuals or subjects that have certain characteristics of interest to a researcher which he/she need to uncover some information from. An example of a population might be individual who are prone to anemia within a given locale (Pagano & Gauvreau, 1993).
As stated earlier, it is not possible for researchers to make inferences and draw conclusion from studying the whole population; for that matter, a representative of a given population is drawn to help scholars collect data and later make generalization of the results, the selected portion of the subject is what has been termed as sample.
Advantages of using a sample
It could be unimaginable when one would like to carry out a research where the population of the elements is in thousands. It is worth noting that sampling fosters accuracy of the result. The argument is that samples are usually smaller compared to the population hence more accurate.
Additionally, using a sample as compared to the whole population has been hailed to save time as well money and other kinds of resources. Similarly, analysis of data is made possible since samples are more appropriate in quality test and are better in subjecting them into statistical test (Elston & Johnson, 1994).
Sampling techniques
There are two main techniques used to do sampling; random and non-random sampling techniques. Random or probability sampling is applicable to both qualitative and quantity research it gives each and every element a calculatable and a non-zero probability of being included in representing the population under study (Pagano & Gauvreau, 1993).
Random Sampling
The sub categories of random sampling include; simple random sampling in which each individual or element of a target population has an equal chance of being included in the sample. With this type, a number is allocated to every element, additionally; to arrive at the desired sample size and elements a table consisting of numbers in random is used. Systematic random sampling is where every Kth element in the total list is chosen in a systematic manner for inclusion in the sample (Polit & Beck, 2007). For instance if there is need to tackle an issue relating to anemia in pregnant women who are 1000 and the researcher resort to recruit 100, then every 10th element will be selected to be included in the sample. To eliminate bias the first element is selected at random.
Stratified random sampling is applicable when the research problem requires comparison between various subgroups. For instance studying cancer prevalence in smokers and non-smokers in residence of a given county in United States of America can use stratified sampling. It is upon the researcher to establish the characteristics he/she will use to stratify the sample for instance age, education level among other variable. Cluster random sampling is used when a unit of sampling is a naturally occurring group of element rather than an individual element. It is applicable when it is much easy to select a group as compared to individuals (Polit & Beck, 2007).
For instance, if public health official need to establish the smoking habits of individual 16 years of age in a given city, this type of sampling can be employed. The city can be subdivided into block let’s say 100 houses and numbered rather than considered as individual. The blocks to be sampled are then drawn at random then all individual who are aged 16 are studied.
Non Random Sampling
This method is deemed cheaper and easier to put in practice. This technique is characterized with some elements not having a chance to be selected. Among the subgroups in this technique include convenient or accidental sampling (Adèr et al., 2008). For instance a researcher can resort to this method when the element are within reach, are willing and or easy to recruit. This is useful in exploratory research where the objective is to gain more insight on a given topic rather than generalizing the outcome.
According to Adèr et al., 2008 snowballing sampling is applicable when penetrating to recruit sample is a difficult task especially when studying about sensitive issues such as drug abuse. The researcher asks an initial group to recruit other individuals who have same characteristic as theirs. As suggested by (Elston & Johnson, 1994 purposive sampling is a deliberate no-random sampling aimed at selecting a sample setting or event within a predetermined characteristics. It is used mainly in qualitative research design aimed at generating hypotheses for further research or piloting a research instrument. The outcomes are not genaralizable.
Lastly, quota sampling is related to stratified random sampling but selection of sample is not random. Characteristics such age, gender are calculated to arrive at a given proportion of for instance males and females to be studied. It is worth noting that the choice of sample member is left in the hands of the researcher (Adèr et al., 2008).
Sample size
Sample size is one such important component in sampling. There are various way and formulas s well as software developed to help in determining the desired sample size (Pagano & Gauvreau, 1993). For instance, in ecological studies, researchers use species are curve. Sample size should be large enough to allow representativeness of the population being studied, analysis of subgroups as well as for statistical analysis. On the same note the sample opt to cater for attrition incidences.
Very large sample size has the potential of making the researcher commit type I error, rejecting null hypothesis when. Consequently, when the sample size is too small there is a higher risk of committing type II error usually resulting to failing to reject a null hypotheses when in fact it opt to be rejected.
Tools and methods of data collection
Sources of data
The data required to fully address all the objectives and research questions related to public health are obtained through perusal of secondary as well as primary data sources. It is worth mentioning that before embarking on collection of the required and necessary data one would first seek permission from the relevant authorities as well as approval from Independent Research Review Board.
Pagano & Gauvreau, 1993 thought of secondary data as a set of data that has been collected by another individual other than the one who is using it. Thus secondary sources are the types or materials having such data. Strictly speaking the information have been gathered by researchers and recorded in books, publication, journals among other media. Internal secondary sources usually sought after while carrying out a research in public health include information compiled by the given healthcare facility such as patient records. External secondary data sources on the included information from the government sources, previous research studies relating to public health.
The major advantages of secondary sources of data are; compared to primary sources of data, it is easy to locate and at the same time it is cheap, it is less obstructive, it is the best method to get information on a given area in which obtaining data directly is impossible, it allows for a longitudinal analysis and there is the possibility of it yield in accurate data as compared to primary (Burns & Grove, 2003). However, there are several disadvantages of collecting data by using secondary sources; these include, it may not be up to date, the user in most cases do not have control on the data already collected, there are cases where obtaining some of the necessary information is unattainable especially in government offices and may take long to shift the desired information (Adèr et al., 2008).
Primary data are those data set collected by the one who is using or intend to use them. Such data are characterized by obtaining them directly from the study sample hence the researcher is immersed in the study and directly interacts with the study population (Pagano & Gauvreau, 1993). For this reason the collected data is deemed to be authentic and up to date as compared to secondary data, the desired data can be collected via a number of ways such as interviews, focused group discussion, questionnaire, there is possibility of collecting data across borders through mailed questionnaires and telephone interviews, the user has control of the data unlike in secondary sources and data can be collected from a large population.
The drawbacks associated with primary sources of data include design problems such as developing surveys; there are chances of delayed and incomplete responses when questionnaires are used (Burns & Grove, 2003).
Tools for data collection
There are various tools or instrument used to collect data, especially primary data. They include interviews, observation, focused group discussion and questionnaires.
Observation
Observation entails seeing activities, interactions or behavior among individuals being studied. There are two types of observation; participant and non participant. The former is where the researcher joins the group and become part of it while the later the researcher is not part of the group.
Among the advantages associated with observation are; it allows the researcher to probe social life in a natural setting, it promotes in-depth understanding of the subject matter, effective in studying attitudes, behavior as well as social process over time, it is flexible in research design and allows use of audio and video tapes (Beiske, 2002). The demerits of observation as a tool of data collection are that most data collected are qualitative that may lead to bias observation, data analysis is a big challenge, it has the potential of introducing reactive effects in the participants due to presence of the observer and last but not least it is impossible to observe a large random sample as a result the study yields poor generalization of the results.
Questionnaires
With regards to questionnaires, there are two types, structure or closed-ended and unstructured or open-ended questionnaires. Questionnaires when used to collect data call for distribution of the same to the selected sample and later collected on an agreed date or immediately once filled.
According to Beiske, (2002) questionnaire covers a large population at a time as they would be distributed to different participants at a time and be collected later or at the same day depending on the willingness of the responded in addressing the questions, due to it being standardized they are more objective, data collected from questionnaire are easy to analyze, due to familiarity with the tool, responded will not be apprehensive, it is also very cost effective as compared to face to face interviews, the tool also reduces bias, effective tool to cover sensitive issues, fosters anonymity and it allows respondents to complete the questions at their own pace.
According to Zolman, 1993 the major problem with questionnaires as a tool of data collection is that there is tendency of respondent to forget vital information, they may answers the questions superficially when it is a long one, to counter this there is need for researchers to developed a short but very inclusive questionnaire, due to standardization, there is no room for explanation incase respondents misinterpret or do not understand the questions, it is applicable to literate individuals only (Sullivan, 2007). This error can be eliminated through interviewing numerous respondents who will have different answers; which will later be combined to make out a comprehensive and conclusive analysis.
Interviews
Another tool for data collection is interviews which involves collecting data through talking to the respondents/interviewees. There two types of interviews; personal (face to face) and telephone (through telephone). This tool is chosen because when the research is done, there are direct contacts with the interviewee hence obtaining first hand information. Both the interviewer and the interviewee are able to clarify on issues of the research done hence, being able to obtain information which is well elaborated and authentic (Polit & Beck, 2004).
It is also a flexible data collection tool as when questions are not well grasped by the interviewee the researcher has room to rephrase and elaborate them. Interview will allow one to learn about things and facts that cannot be observed directly, the response rate is higher as compared to questionnaires, the interviewee need not to be literate, allow for more complex and detailed questions to be asked and finally it adds internal viewpoints to outward behaviors.
Despite the advantage mentioned, according to Sullivan, 2007 it’s a slow method because the process calls for interview of one person at a time, cannot fully trace events and trends that occurred in the past. Additionally, interview is an expensive tool to use; it is also subject to respondent and interviewer bias and where the interviewer cannot talk directly to the interviewee, there is need for an interpreter resulting to interpreter bias.
Focused Group Discussion
It is a tool of collecting primary data and mainly include unstructured interview with a small groups of people which interact with one another as well as a focus group leader who is basically the researcher (Polit & Beck, 2004). For instance, individuals suffering from a given ailment such as HIV and AIDs can be engaged to establish the best way to cater for their needs health wise. It is worth noting that the tool makes use of group dynamics to stimulate discussion, gaining insights as well as generating ideas with regards to a topic under study. It is the best tool to establish and answer questions relating to what, how, why they think in a certain manner regarding the issue at hand. Ideally, the group should constitute 5-10 individual and the discussion lasting between 1-2 hours (Beiske, 2002).
As suggested by Ahlbom, 1993 the tool is appropriate for action research, allow more complex issues to be brought into a group setting, it is less threatening as compared to one on one interview and it is useful in trying to understand public opinion on a given subject. However, the tool is associated with a number of drawbacks. These include; it might be time consuming especially when the focus group leader may fail to take control of the discussion, it has limited sampling methods; either purposive or convenient sample, data collection and analysis is a difficult task and it calls for a skilled leader and when this is not in place it is difficult to utilize the existing group dynamics.
Validity and reliability of tools
It is of paramount importance to test the reliability and validity of tools used to collect data so that the generalization of the results is guaranteed. According to Zolman, 1993 reliability has been though as the extent to which results are consistent over time and an accurate representation of the total population under study. A tool is deemed reliable if outcomes of a study can be replicated and reproduced It is worth to note that reliability can be assessed by using three main approaches. Inter-rate reliability, parallel forms reliability, test-retest reliability are terms associated with reliability. Test that can be used to test for reliability include correlation, equivalence approach, triangulation among others.
Test retest which is also known as coefficient of stability is computed by administering the tool to the same group of people in two different occasion then carryout a correlation analysis if high then the tool is reliable. Split half reliability also known as coefficient of internal consistency which measures how well the questions on an instrument assess the same skill or characteristics. It is computed by dividing an instrument into two parts for instance odd and even questions then a correlation analysis is executed (Polit & Beck, 2004).
“Validity refers to the degree to which the test or measurement tools are truly able to measure what the researcher intended to measure” (Beiske, 2002). Strictly speaking, validity tries to establish if the research did make it possible for the instrument used to hit the bull’s eye of the main objective of the research. There are numerous tests such as Wechsler Adult Intelligence scale, SAT and content validity index to mention a few (Ahlbom, 1993).
Conclusion
From the review of the concept of biostatistics, it is evident that it is applicable in all sphere of human life public health not being an exception. Two major issues relating to research have been addressed in a succinct manner; these are sampling and data collection. With regards to the later, the various sources of data, primary and secondary have been brought to light. On the same note various tools or instrument of data collection such as interviews, questionnaires, focused group discussion and observations have been adequately covered. Validity and reliability of such tools have also been briefly tackled as they are of paramount importance in coming up with desired outcomes.
With regards to sampling, various terms such as population were sample were defined. The advantages of using sampling in research have also been brought to light. The two main techniques of sampling random and not random plus their subcategories and their application have been addressed. Lastly the importance of having the right sample size has been emphasized.
References
Adèr, H. (2008). Advising on research methods: A consultant’s companion. Huizen, The Netherlands: Johannes van Kessel Publishing.
Ahlbom, A. (1993). Biostatistics for Epidemiologists. Boca Raton: Lewis Publishers.
Beiske, B. (2002). Research Methods: Uses and Limitations of Questionnaires, Interviews, and Case Studies. Manchester: University of Manchester.
Burns, N. & Grove, S. (2003). Understanding Nursing Research. New York: Blackwell Publishing.
Elston, R. & Johnson, W. (1994). Essentials of Biostatistics. Philadelphia: F.A. Davis.
Pagano, M., & Gauvreau, K. (1993). Principles of Biostatistics. Belmont, CA: Duxbury Press.
Polit, D.& Beck, C. (2004). Nursing Research Principles and Methods. New York: Lippincott Williams.
Polit, F. & Beck, C. (2007). Essentials of Nursing Research. Baltimore: Lippincott.
Sullivan, L. (2007). Essentials of biostatistics in public Health Care. Sudbury, MA: Jones & Bartlett Learning.
Zolman, J.F. (1993). Biostatistics. New York: Oxford University Press.