SFO Survey Machine Learning Example

Each year, San Francisco Airport (SFO) conducts a customer satisfaction survey to find out what they are doing well and where they can improve. The survey gauges satisfaction with SFO facilities, services, and amenities. SFO compares results to previous surveys to discover elements of the guest experience that are not satisfactory.

The 2013 SFO Survey Results consists of customer responses to survey questions and an overall satisfaction rating with the airport. We investigated whether we could use machine learning to predict a customer's overall response given their responses to the individual questions. That in and of itself is not very useful because the customer has already provided an overall rating as well as individual ratings for various aspects of the airport such as parking, food quality and restroom cleanliness. However, we didn't stop at prediction instead we asked the question:

What factors drove the customer to give the overall rating?

Here is an outline of our data flow:

Load data: Load the data as a DataFrame
Understand the data: Compute statistics and create visualizations to get a better understanding of the data to see if we can use basic statistics to answer the question above.
Create Model On the training dataset:
- Evaluate the model: Now look at the test dataset. Compare the initial model with the tuned model to see the benefit of tuning parameters.
- Feature Importance: Determine the importance of each of the individual ratings in determining the overall rating by the customer

This dataset is available as a public dataset from https://catalog.data.gov/dataset/2013-sfo-customer-survey-d3541.

Table

As you can see above there are many questions in the survey including what airline the customer flew on, where do they live, etc. For the purposes of answering the above, focus on the Q7A, Q7B, Q7C .. Q7O questions since they directly related to customer satisfaction, which is what you want to measure. If you drill down on those variables you get the following:

Column Name|Data Type|Description ---|---|---|--- Q7B_FOOD|INTEGER|Restaurants Q7C_SHOPS|INTEGER|Retail shops and concessions Q7D_SIGNS|INTEGER|Signs and Directions inside SFO Q7E_WALK|INTEGER|Escalators / elevators / moving walkways Q7F_SCREENS|INTEGER|Information on screens and monitors Q7G_INFOARR|INTEGER|Information booth near arrivals area Q7H_INFODEP|INTEGER|Information booth near departure areas Q7I_WIFI|INTEGER|Airport WiFi Q7J_ROAD|INTEGER|Signs and directions on SFO airport roadways Q7K_PARK|INTEGER|Airport parking facilities Q7L_AIRTRAIN|INTEGER|AirTrain Q7M_LTPARK|INTEGER|Long term parking lot shuttle Q7N_RENTAL|INTEGER|Airport rental car center Q7O_WHOLE|INTEGER|SFO Airport as a whole

The possible values for the above are:

0 = no answer, 1 = Unacceptable, 2 = Below Average, 3 = Average, 4 = Good, 5 = Outstanding, 6 = Not visited or not applicable

Select only the fields you are interested in.

"'missingValues(Q7A_ART) Q7A_ART', 'missingValues(Q7B_FOOD) Q7B_FOOD', 'missingValues(Q7C_SHOPS) Q7C_SHOPS', 'missingValues(Q7D_SIGNS) Q7D_SIGNS', 'missingValues(Q7E_WALK) Q7E_WALK', 'missingValues(Q7F_SCREENS) Q7F_SCREENS', 'missingValues(Q7G_INFOARR) Q7G_INFOARR', 'missingValues(Q7H_INFODEP) Q7H_INFODEP', 'missingValues(Q7I_WIFI) Q7I_WIFI', 'missingValues(Q7J_ROAD) Q7J_ROAD', 'missingValues(Q7K_PARK) Q7K_PARK', 'missingValues(Q7L_AIRTRAIN) Q7L_AIRTRAIN', 'missingValues(Q7M_LTPARK) Q7M_LTPARK', 'missingValues(Q7N_RENTAL) Q7N_RENTAL', 'missingValues(Q7O_WHOLE) Q7O_WHOLE'"

[Row(Q7O_WHOLE=3.8776130748764728)]

Table

Visualization

2,631 rows

Table

Downloading artifacts: 0%| | 0/20 [00:00<?, ?it/s]

Uploading artifacts: 0%| | 0/4 [00:00<?, ?it/s]

Table

0.5470732399238778

SparseVector(14, {0: 0.0546, 1: 0.1245, 3: 0.5302, 5: 0.2296, 7: 0.0109, 8: 0.0046, 9: 0.0119, 13: 0.0337})

Table

As you can see below, the 3 most important features are:

Signs
Screens
Food

This is useful information for the airport management. It means that people want to first know where they are going. Second, they check the airport screens and monitors so they can find their gate and be on time for their flight. Third, they like to have good quality food.

This is especially interesting considering that taking the average of these feature variables told us nothing about the importance of the variables in determining the overall rating by the survey responder.

These 3 features combine to make up 65% of the overall rating.

[Row(sum(Importance)=0.884304077071294)]

Visualization

14 rows

Visualization

5 rows

True

decision-trees-sfo-airport-survey-example(Python)

SFO Survey Machine Learning Example

Load the Data

Understand the Data

Create Model

Evaluate the model

Feature Importance