Transport for London: Marketing Analytics

Topic: Marketing Words: 2301 Pages: 8

Sampling Scheme

The appropriate sampling scheme that Transport for London (TfL) should use in surveying attitudes and preferences of passengers for Night Tube services is stratified method. Since TfL lacks a sampling frame, the use of the stratified sampling method would allow the categorization of passengers into distinct groups with unique attributes. The sampling scheme would stratify the population of passengers into five transport lines, namely, Jubilee, Central, Northern, Victoria, and Piccadilly.

The stratified sampling method is relevant because transport lines form strata, which classify passengers into diverse groups. By collecting data from passengers in each transport line, RfL management would be in a position to capture varied attitudes and preferences for Night Tube services. According to Malhotra, Nunan, and Birks (2017), stratified sampling requires cases in a group to exhibit homogeneity, while those between groups to show heterogeneity.

In this view, stratified sampling would aim to highlight unique and similar attitudes and preferences within and between transport lines. As TfL does not have a sampling frame to allow simple random sampling, systematic sampling would be used to select passengers in each transport line for a survey.

Comparative analysis shows that the stratified sampling method has some advantages over other sampling methods. The significant advantage of stratified sampling is that it is a probability scheme that eliminates researchers’ bias and enhances the representation of the population. Roy and Acharya (2016) assert that probability methods of sampling offer an equal chance of selection, and thus, they generate a valid and reliable sample, representing the target population.

In this case, the sampling method would enable researchers to select passengers who reflect the target population of TfL. Another critical advantage of stratified sampling is that it considers diversity of sub-populations by classifying them into strata for adequate representation. Transport lines in TfL serve as strata of diverse sub-populations of passengers. The advantages of systematic sampling are that it improves representativeness, simplifies execution, and does not require sampling frame (Malhotra et al., 2017). The absence of sampling frame, in this case, made systematic sampling is a relevant approach to the selection of passengers for the survey.

However, disadvantages associated with the probability sampling are that it is more expensive, cumbersome, and sophisticated when compared to the non-probability sampling (Malhotra et al., 2017). Moreover, the use of systematic sampling is disadvantageous because it increases the occurrence of methodical error. Despite these disadvantages, stratified sampling is appropriate method for choosing passengers to survey and obtain their attitudes and preferences for Night Tuber services.

In the selection of the sampling scheme, non-probability methods were rejected due to major setbacks. Malhotra et al. (2017) hold that non-probability sampling ignores a significant proportion of the population and provides low external validity of the sample.

Convenience sampling was rejected because it is prone to selection bias and gives poor representation of the population. Similarly, judgmental sampling was rejected because it generates a subjective sample with low level of representation of passengers. Quota and snowballing methods were not appropriate since they are not only subject to researchers’ biases and unlikely to generate a representative sample, but they are cumbersome to undertake (Malhotra et al., 2017). Overall, the issue of selection bias and representation led to the rejection of the non-probability methods.

Although the study preferred probability methods, it rejected simple random sampling and cluster sampling as they do not provide appropriate parameters to enhance the representation of the population. The absence of the sampling frame makes it impossible to apply simple random sampling in the selection of passengers. Malhotra et al. (2017) argue that simple random sampling requires sampling frame, generates large sample size, making data collection cumbersome, expensive, and prone to substantial standard errors. Moreover, the lack of sampling frame makes the selection of clusters difficult based on cluster method. The existence of five transport lines and over a billion passengers makes cluster sampling not feasible.

Statistics

Developing a Likert Scale

The first decision to consider in the development of a Likert scale is the number of data points. The higher the number of data points, the more sensitive is a scale in the measurement of construct of interest. Nevertheless, participants in a study are limited in the number of data points in which they can provide valid responses. Malhotra et al. (2017) recommend the use of 5 to 9 data points on Likert scales. In this range, numerous factors determine appropriate number of data points. The number of points varies according to the knowledge of participants, method of data collection, and the nature of analysis (Malhotra et al., 2017).

Participants who are very knowledgeable about subject matter require a high number of data points, whereas those without expertise need a small number of data points. In data collection, telephone and mail questionnaires require small data points due to limitations of time and space, while online surveys need expanded data points. Regarding the nature of analysis, pooling of Likert items and generalization of findings require condensed data points, whereas the use of sophisticated tools need extended data points.

Balancing of a scale is the second decision that requires consideration in the development of scale. Malhotra et al. (2017) say that balancing a scale is necessary to avert the skewness of responses. The favorable and unfavorable responses regarding a given statement ought to balance or requires balancing during analysis to eliminate skewed analysis. The third decision entails the use of odd or even data points on a Likert scale. If researchers expect participants to hold a neutral position, a Likert scale ought to be odd. However, if researchers want to compel participants to offer biased responses, a Likert scale should have even data points. The fourth decision involves consideration of forced or the absence of forced choices in data points. In forced-choice, participants who do not have any opinion tend to indicate a neutral point of the scale.

Therefore, a Likert scale ought to consider if participants have opinion or not regarding the subject matter under the study. The fifth consideration is that Likert items may have numerical, pictorial, and verbal descriptions, depending the nature of participants. Moreover, researchers need to balance the use of weak and robust anchors to prevent ambiguity in responses (Malhotra et al., 2017). The sixth decision that researchers should consider is the use of physical scale to match psychometric desires of participants. For instance, the use of emoticons among children and the use of numbers among adults.

Descriptive Statistics

The sample mean is an essential descriptive statistic because it measures of central tendency of continuous data. When the data follows the normal distribution, means is the best measure of central locations because mode and median do not deviate from it significantly (Roy & Acharya, 2016). Hence, sample mean is an appropriate estimate of the location of a distribution. In contrast, sample standard deviation is a critical descriptive statistic of the dispersion of a distribution. The degree of dispersion shows the nature of variability in the population and enriches the interpretation of data.

Sample Size Determination

Choosing a sample size requires consideration of numerous statistical factors. The first step needs specification of the level of precision based on optimum possible difference (D) between means of population and sample (means) or the proportion of confidence interval to be within ±5% (Malhotra et al., 2017). The next step entails the creation of the desired confidence interval (90%, 95%, or 99%) in both approaches of means and proportions.

Subsequently, determination of standard scores (z values) associated with the desired confidence interval. The determination of the standard deviation of means and proportions. Ultimately, the formula of the standard error is used to determine the sample size based on means [n = σ²z²/D²] and proportions [n = π(1-π)z²/D²] (Malhotra et al., 2017). If the sample size is greater than 10% of the population, the correction factor for finite population is applied.

SPSS Analysis

Strength of the Joint Effect of the Factors

The analysis of variance shows that the section of the newspaper and the day of the week have a powerful influence on the number of inquiries made by customers. Specifically, the section of the newspaper and the day of the week accounts for 98.3% of the variation in the number of customer inquiries (R²= 0.983). According to Field (2017), the effect size is a parameter that takes values ranging from zero to one, which indicates the degree of the overall impact of independent variables on a dependent variable. In this case, the independent variables are the section of the newspaper and the day of the week.

Significance of Variables

Since the editor suspected that the section of the newspaper and the day of the week are two factors that influence the number of customer inquiries, two-way analysis of variance would provide appropriate inference. The examination of main effects shows the significance of individual factors on the dependent variable (Field, 2017). The ANOVA model is statistically significant in predicting the effect of the section of newspaper and the day of the week on the number of customer inquiries, F(15,45) = 171, p = 0.000.

The significance of the model implies that the overall effect has a considerable influence, and thus, requiring analysis of the individual effects. The day of the week has a statistically significant effect on the number of customer inquiries, F(5,45) = 491.411, p = 0.000. Based on descriptive statistics, it is apparent that the number of customer inquiries varies according to the day of the week.

Friday registered the highest number of customer inquiries (M = 10.92, SD = 1.782) followed by Wednesday (M = 8.17, SD = 1.697), Tuesday (M = 8.50, SD = 2.067), Monday (M = 8.08, SD = 3.315), and Thursday (M = 6.00, SD = 1.758). Moreover, ANOVA test indicates that the section of the newspaper has a statistically significant influence on the number of customer inquiries, F(2,45) = 15.304, p = 0.000. The line plot shows that advertising on the business section attracts the most customer inquiries, followed closely by the news section, while the sports section registers the least.

Since the section of the newspaper and the day of the week are individually statistically significant predictors of the number of customer inquiries, the analysis of the interaction would show their combined effects. According to the ANOVA, their interaction effects are statistically significant in influencing the number of customer inquiries, F(8,45) = 9.667, p = 0.000. The interaction effects are evident because the business section registers the highest number of customer inquiries on Monday and Thursday, while the news section experience the highest number on Tuesday, Wednesday, and Friday.

However, the sports section registers the lowest number of customer inquiries on Monday through Wednesday, but it ranks second on Thursday and Friday. A critical examination of the line plot indicates that the interaction between the section of the newspaper and the day of the week exhibits a disordinal pattern with crossover. Malhotra et al. (2017) note that a disordinal pattern with crossover presents the strongest form of interaction between independent and dependent variables. Therefore, the interaction effects of the section of the newspaper and the day of the week are not only statistically significant in influencing the number of customer inquiries but also the strongest.

Pricing Strategy

The results of ANOVA provides an evidence-based pricing strategy for paid adverts in the newspaper. Paid adverts should be priced according to the day of the week owing to differences in the number of customer inquiries. The adverts for Friday should be priced the highest followed by Wednesday, Tuesday, Monday and Thursday. Furthermore, paid adverts ought to be priced as per the section of the newspaper with the business section being the highest, the news section is moderate, and the sports section is the least. Due to the interaction effects, the business section ought to be the most expensive on Monday then the news and sports sections.

On Tuesday and Wednesday, the news section should cost the highest followed by the business and the sports sections. On Thursday, the business section ought to be the most expensive trailed by the sports and news sections. In contrast, the news section ought to have the highest price, followed by sports and business sections.

Sporting Events

The following is the full regression model:

Attendance = -3666.537 + 170.894 (Temp) – 11.246 (Win) + 22.747 (OpWin) + 1823.671 (Weekend) + 9074.376 (Promotion)

The assumptions of the regression analysis are that attendance, temperature, and winning percentages of both preferred team and opponent teams have linear relationships without exhibiting multicollinearity. Another assumption is that attendance follows the normal distribution.

The regression model shows that temperature, percentage of team’s winning, percentage of opponent team’s winning, weekend, and promotion accounts for 38.6% of the variation in attendance (R² = 0.386). The regression model is statistically significant in predicting the effects of independent variables on the attendance of sports, F(5,73) = 9.175, p = 0.000. The analysis of the coefficients of regression indicates that temperature (β = 170.894, p = 0.014) and promotion (β = 9074.376, p = 0.000) are statistically significant positive predictors of the attendance of sports. In contrast, team’s win (β = -11.246, p = 0.559), opposite team’s win (β = 22.747, p = 0.071), and weekend (β = 1823.671, p = 0.300) are statistically insignificant predictors of the attendance of sports.

The adjustment of the regression model is necessary to remove insignificant predictors and retain the significant ones. In this case, the removal of a team’s win and an opposite team’s win, but the retention of temperature and promotion would enhance the predictive effect of the regression model. Although a weekend is not a statistically significant predictor, it increases the number of attendance by about 1823 and thus should be included in the model to enhance ticket sales.

References

Field, A. (2017). Discovering statistics using IBM SPSS statistics (5th ed.). Thousand Oaks, CA: SAGE Publication.

Malhotra, N.K., Nunan, D., & Birks, D. F. (2017). Marketing research: An applied approach (5th ed.). New York, NY: Pearson.

Roy, T. K., & Acharya, R. (2016). Statistical survey design and evaluating impact. New York, NY: Cambridge University Press.