• Google Compute Engine 安装
  • Prerequisites
    • Install Google Cloud SDK
    • Install bdutil
  • Deploying Flink on Google Compute Engine
    • Set up a bucket
    • Adapt the bdutil config
    • Adapt the Flink config
    • Bring up a cluster with Flink
    • Run a Flink example job:
    • Shut down your cluster

    Google Compute Engine 安装

    This documentation provides instructions on how to setup Flink fully automatically with Hadoop 1 or Hadoop 2 on top of a Google Compute Engine cluster. This is made possible by Google’s bdutil which starts a cluster and deploys Flink with Hadoop. To get started, just follow the steps below.

    • Prerequisites
      • Install Google Cloud SDK
      • Install bdutil
    • Deploying Flink on Google Compute Engine
      • Set up a bucket
      • Adapt the bdutil config
      • Adapt the Flink config
      • Bring up a cluster with Flink
      • Run a Flink example job:
      • Shut down your cluster

    Prerequisites

    Install Google Cloud SDK

    Please follow the instructions on how to setup the Google Cloud SDK. In particular, make sure to authenticate with Google Cloud using the following command:

    1. gcloud auth login

    Install bdutil

    At the moment, there is no bdutil release yet which includes the Flinkextension. However, you can get the latest version of bdutil with Flink supportfrom GitHub:

    1. git clone https://github.com/GoogleCloudPlatform/bdutil.git

    After you have downloaded the source, change into the newly created bdutil directory and continue with the next steps.

    Deploying Flink on Google Compute Engine

    Set up a bucket

    If you have not done so, create a bucket for the bdutil config and staging files. A new bucket can be created with gsutil:

    1. gsutil mb gs://<bucket_name>

    Adapt the bdutil config

    To deploy Flink with bdutil, adapt at least the following variables inbdutil_env.sh.

    1. CONFIGBUCKET="<bucket_name>"
    2. PROJECT="<compute_engine_project_name>"
    3. NUM_WORKERS=<number_of_workers>
    4. # set this to 'n1-standard-2' if you're using the free trial
    5. GCE_MACHINE_TYPE="<gce_machine_type>"
    6. # for example: "europe-west1-d"
    7. GCE_ZONE="<gce_zone>"

    bdutil’s Flink extension handles the configuration for you. You may additionally adjust configuration variables in extensions/flink/flink_env.sh. If you want to make further configuration, please take a look at configuring Flink. You will have to restart Flink after changing its configuration using bin/stop-cluster and bin/start-cluster.

    To bring up the Flink cluster on Google Compute Engine, execute:

    1. ./bdutil -e extensions/flink/flink_env.sh deploy
    1. ./bdutil shell
    2. cd /home/hadoop/flink-install/bin
    3. ./flink run ../examples/batch/WordCount.jar gs://dataflow-samples/shakespeare/othello.txt gs://<bucket_name>/output

    Shut down your cluster

    Shutting down a cluster is as simple as executing

    1. ./bdutil -e extensions/flink/flink_env.sh delete