Google Cloud Platform - What's the Big Deal
Author: Brian Nettles | Date Created: May 18, 2018 | Visits: 85
The Google Cloud Platform is one of several cloud platforms available for hosting your applications and enterprise servers, no matter what size of company you are.
The biggest competitors to the Google Cloud Platform are Amazon Web Service and Azure (Microsoft). In fact, Google is really more of the late enterer of this market and therefore their market share is significantly less than the others. Yet, that is no reason to shy away.
The greatest differentiator is that Google's cloud offering is really not like the others under the hood. Google has taken their own proprietary technologies, the same ones they use for all of their other products and has now made them commercially available to you.
Let's go down the offerings.
Google Application Engine
This is a platform as a service that allow developers to write their code and deploy it without having to worry about the infrastructure. It is automatically scalable and the infrastructure is fully managed by Google. Supported on this environment are applications written in Python, Java, NodeJS and C#. This works well for some people and for many applications. However, there is no persistent storage in the server itself. And you don't have access to a .htaccess file, or an apache.conf file. This causes some limitations.
To deploy your applications, you should follow documentation on the Google sites teaching how to deploy to GAE. But the basic approach is this.
a. Add an app.yaml
file to your application following Google's specifications.
b. With the GCloud application already installed on your local machine, reinitialize your environment: gcloud init
c. Type the command from within the base directory of your source code: gcould app deploy
It is that simple.
If you want more control over your specific environment, try Google Compute Engine.
Google Compute Engine
GCE is like having access to as many virtual machines as you like. It is very similar to EC2 in the Amazon Web Services world. But exactly what it is may not be what you think. This is not VMWare. It is simpler than that. Yet Google put a significant investment into creating this Not VMWare VMWare. With VMWare, you install the entire operating system into a shell. With GCE, Google installs a shell operating system onto a machine with a matching base operating system to handle underlying functionality for the Virtual Machine. It was a rather ingenious way for Google to create Virtual Machines in a light weight manner. In general, they work very well.
From my practical experience with Google's implementation, usually the VMs work just like the real machine. I periodically run into issues that should work but don't work. Nevertheless, there is usually a workaround to fix it.
Another benefit of GCE is its ease of access for users with proper authorization. With EC2, getting access to the machine by SSH and uploading files to the machine was always somewhat of a hassle to set up. I find the experience with GCE much simpler.
I currently make heavy use of GCE and am rather pleased with it.
This is a managed MySQL or Postgress SQL Virtual Machine. When creating the service, you decide on hard drive size, memory size, and number of CPUs, High Availability, and several other options. Google will handle backups and all other maintenance. Only machines that are given access specifically are allowed access. It is a solid solution for most applications that rely on MySQL or Postgres. CloudSQL reportedly is limited to 4000 concurrent connections and up to 10TB of data.
Now and again, a company will outgrow Cloud SQL. Google took steps to fix this with Cloud Spanner. They took the concepts of Oracle RAC, and made MySQL fully replicated throughout the world and greatly increased its ability to scale. Cloud Spanner is Google's answer to making MySQL an unbounded database available worldwide.
Many applications are turning to using NoSQL databases. There are many on the market. Cloud Datastore is Google's NoSQL database. With NoSQL databases, you break away from all of the tough constraints given in the SQL world, flatten out your tables, don't worry about data integrity so much, and make use as much as possible key value pairs as your primary key. These tables are schema-less. How is this good? First of all, they are faster. Second, adjusting applications to changes are simple. Third, writing code to insert and read from this database is ridiculously easy.
Another important point about Cloud Datastore has to do with application design, even when the application is using a different SQL database. Google Cloud Trainer Jason Baker, in his Coursera courses points out that Google recommends using Cloud Datastore to store application state. As applications are now written to have rapidly scalable compute machines to handle computing, state used for application flow cannot be held on these machines. You should look to Cloud Datastore for building this application functionality of storing temporary state.
Cloud Storage is for BLOB objects. This applies to the storage and retrieval of CSV files, startup scripts for GCE instances, public images for websites, mp3 files, mov files, or any other form of immutable data.
Cloud Storage also has options for regional storage, or multi-regional storage. Regional storage is cheaper. There is more replication with multi-regional storage with higher Availability. However, if using csv files to store data, regional storage can be faster with less egress fees associated as imports into regionally located databases will be of closer proximity. Cloud storage also has nearline and coldline storage for object accessed much less frequently.
Cloud Storage objects can also be life-cycle managed to only persist as long as you tell it to and then auto delete.
Big Query is a rapidly scalable SQL database ready to crunch large amounts of data. Say you have multiple CSV files with collectively billions of rows data, and you want to run SQL on this data. You can rapidly insert all these csv files into a table (or multiple tables), and then run normal SQL including joins and all other standard SQL commands. Behind the scenes, as many processors as are needed are immediately accessed to by Google to rapidly get you the results. Having worked with other databases in the past always working to structure your indexes, the results you see here are quite amazing.
Your tables can be permanently persisted or persisted only for short periods of time depending on your need. This engine is commonly used for reporting as well as in ETL processing.
Big Table and Dataproc
Big Table is Google's implementation of a Hadoop cluster. You could spend a couple of hours using GCE to create a hadoop cluster, or you could do a couple of clicks and have Google create it for you. Hadoop is a NoSQL BASE transactions database fantastic for doing heavy data crunching.
Dataproc adds Spark and Hive to Hadoop for additional statistical analysis.
Pub/Sub is a messaging service. A comparison to a Twitter feed is very valid. You create topics and feed data to the topic. The messages can be name / value pair messages, and the value message can be JSON. Other applications can subscribe to the topic and retrieve the data on demand, or on arrival. Pub/Sub is working with other Google Services including GMAIL and is known to receive as many as 300 million messages per second. Working with Cloud Dataflow, this is a perfect option for ETL pipelines.
Cloud Dataflow is Google implementation of Apache Beam. Apache Beam is used for processing batches or running subscriptions. It works well with Pub/Sub to retrieve messages as they come in. Application logic for this service can be written in Java or Python. In Java, it relies on the Apache Beam framework for handling pipelines. While retrieving Pub/Sub message, you can then use application logic to transform data and insert the data into BigQuery or any other location of choice. BigQuery is recommended if the amount of data you are processing needs to be extremely fast.
In addition to all of the services and offerings available to you, Google Cloud Platforms has these other advantages.
1. The offerings are very user friendly with good documentation.
2. The pricing is generally less expensive than AWS.
3. Sustained use discounts are available based on the amount of time throughout the month the instances are active. Discounts may be as high as 40% for an uptime of 75-100 percent. Google also offers per minute billing compared to per hour billing offered by other providers. In other words, if your GCE instance is only on for five minutes, you are charged for 5 minutes, not one hour.
Enjoy you Cloud Computing Time