In this article, we show you how to create a simple neural network model to predict if housing prices are above the median or not using Keras, and later how to deploy the model using Google Cloud Platform tools.
To do this, you must be familiar with google Colaboratory or Jupter notebook and have the following packages installed:
You also need to have a GCP account. You can use your google gmail to set up it at the following link https://cloud.google.com/
Steps to cover in this article
• Exploring the data
• Normalizing the data set
• Building a neural network
• Training a neural network
• Viewing loss and accuracy
• Deploy model on Google Cloud Platform (GCP)
• Save data to be predict in JSON format
• Predict using the GCP neural network model
Exploring the Data
The first step is to explore and understand the data before normalizing it. To do this, let’s use the pandas library. Open a new Python notebook at Colab or Jupyter and import pandas library and the data set.
The data set can be downloaded directly from Git hub, using the link: https://github.com/markitosanches/machine/blob/master/housepricedata.csv
This data set is an adaptation from Zillow’s Home Value Prediction Kaggle competition data.
import pandas as pd
data = pd.read_csv(“housepricedata.csv”);
The first 10 columns we can see our inputs features.
- Lot Area (in sq ft)
- Overall Quality (scale from 1 to 10)
- Overall Condition (scale from 1 to 10)
- Total Basement Area (in sq ft)
- Number of Full Bathrooms
- Number of Half Bathrooms
- Number of Bedrooms above ground
- Total Number of Rooms above ground
- Number of Fireplaces
- Garage Area (in sq ft)
The last column is the feature that we would like to predict.
- Is the house price above the median or not? (1 for yes and 0 for no)
The machine learning model process vectors, so it is necessary to convert the data into an array.
dataset = data.values
This would be the output:
array([[ 8450, 7, 5, ..., 0, 548, 1],
[ 9600, 6, 8, ..., 1, 460, 1],
[11250, 7, 5, ..., 1, 608, 1],
[ 9042, 7, 9, ..., 2, 252, 1],
[ 9717, 5, 6, ..., 0, 240, 0],
[ 9937, 5, 6, ..., 0, 276, 0]])
We can explore the data using the function dataset.shape, the result will be an array (1640, 11). It means and array with 1460 row and 11 columns.
The next step is to split the data set into input resources “X” and the resource to be predicted “y”. “X” would be columns 1 to 10, while “y” would be column 11. For this we will create 2 variables “X” and “y”.
X = dataset[:,0:10]
X.shapeY = dataset[:,10]
After creating the variable, we can explore it with the shape function and see that X is a vector of (1460, 10) and y (1460,)
Normalizing the Data set
The idea of normalization is to adjust measured values at different scales to a common nominal scale, usually before average. One way to scale the data is to use an existing scikit-learn package.
We will use a function called min-max scaler, which scales the dataset so that all input resources are between 0 and 1, including:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
Normalized data was saved to a variable called X_scale. Now we need to split data for training, testing and validation. In scikit-learn, we will import the train_test_split function to do this. The test size will be 30% and from the test size we will use 50% as validation. This means that the train size will be 70%, the test size 15% and the validation size 15%.
from sklearn.model_selection import train_test_split#Train and test set (30%)
X_train, X_val_and_test, y_train, y_val_and_test = train_test_split(X_scale, y, test_size=0.3)#validation test (50% of test set)
X_val, X_test, y_val, y_test = train_test_split(X_val_and_test, y_val_and_test, test_size=0.5)
Building a Neural Network
The first step in building a neural network is to define the architecture. For this article, we will use a sequential model, with 1 input layer size 10 (the number of resources defined for X), 2 hidden layers with 100 neurons each, using a Relu activation and an Output layer with 1 neuron and a Sigmoid activation.
The model use an adam optimizer and mean squared error loss.
To execute the described architecture we will use Keras.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.train import AdamOptimizer
import tensorflow as tf
Creating the model.The model will be saved in a variable named model.
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=(10,)))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=['accuracy'])
Print the model summary.
Plot the model.
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)
Training a Neural Network
Now that our model is ready, we can train and analyze losses and accuracy. To train the model, we will use the X_train and y_train features, epochs of 100, the validation sample X_val and y_val, and save it in a variable called hist.
hist = model.fit(X_train, y_train, batch_size=32, epochs=100, validation_data=(X_val, y_val))
Viewing Loss and Accuracy
We can evaluate model loss and accuracy by executing the following command. The first value is the loss, the lower the better, the second is the accuracy the higher the better.
We can also plot a graph for better analysis. For it we need the Matplotlib library.
import matplotlib.pyplot as plt
Deploy Model on Google Cloud Platform
To deploy the model using GCP, you must have a project ID. If you don’t have one, open your GCP console and create a new project. Under the project page, you can find the project ID. Now we can do all the configuration using the python notebook.
Set up some global variables for our GCP project: Your project ID, create a name for your Bucket (storage in your GCP), server location, model version, and model name.
GCP_PROJECT = 'INSERT HERE YOUR PROJECT ID'
KERAS_MODEL_BUCKET = 'gs://house-prediction-gcp'
REGION = 'us-central1'
KERAS_VERSION_NAME = 'v1'
MODEL = 'model_house'
Run the code below to authenticate your GCP credential, follow the instructions. Click the link to connect it to your gmail, copy the authentication number, paste it into the field and press enter.
if 'google.colab' in sys.modules:
from google.colab import auth as google_auth
%env GOOGLE_APPLICATION_CREDENTIALS ''
Create your Bucket, the storage location in GCP. You can do this directly from the console or by running the following code. Your Bucket name will be the name given in the KERAS_MODEL BUCKET variable.
! gsutil mb -p $GCP_PROJECT -l $REGION $KERAS_MODEL_BUCKET
# Display what is in the bucket
!gsutil ls -al $KERAS_MODEL_BUCKET
Configure you GCP project using the variable GCP_PROJECT.
!gcloud config set project $GCP_PROJECT
Export your model to your Bucket in a folder called keras_export. This folder will have a file named saved_model.pb. This will be your saved template in GCP.
export_path = tf.contrib.saved_model.save_keras_model(model, KERAS_MODEL_BUCKET + '/keras_export')
Create a model under GCP console. The model name will be the value of the variable MODEL.
!gcloud ai-platform models create $MODEL
Create a version for your model, the version will be connected with with your Bucket, keras_export/saved_model.pb. The version name will be the value of the variable KERAS_VERSION_NAME.
!gcloud beta ai-platform versions create $KERAS_VERSION_NAME --model $MODEL \
We’re ready, let’s create a JSON file to test our implemented model.
Save data to be predict in JSON format
To use our Model at GCP, we must submit a request in JSON format. We will create a JSON object, save it locally and send a request to our model through GCP.
To test our model we use values from our X_test sample and converted it in JSON.
[0.03430788, 0.66666667, 0.5, 0.14729951, 0.66666667, 0.5, 0.375, 0.41666667, 0.33333333, 0.45416079]
Predict using the GCP neural network model
Finally we can test our model, deploying the JSON file direct in our GCP Model.
prediction = !gcloud ai-platform predict --model=$MODEL --json-instances=predictions.json --version=$KERAS_VERSION_NAME
The result in this prediction was [0.9708504676818848], this result may change according to the data split.
The full code can be found on my Git hub.
About the author:
Marcos Sanches is a business management specialist with strong knowledge in analysis, design and execution of information technologies. Practical experience in Brazil, France and Canada. Critical sense of analysis with strategic vision and market viability. Updated with the latest technologies, using this knowledge and experience to optimize or build internal business processes, applying the best tools and practices. Beginner in Machine & Deep learning.
Professor of Computer Science in Canada at College and University.
All comments and feedbacks are welcome.
Reference: Part of this article references the article “Build Your First Neural Network to Predict House Prices with Keras,” written by Joseph Lee Wei En. The full article can be found at: https://hackernoon.com/build-your-first-neural-network-to-predict-house-prices-with-keras-3fb0839680f4