Cloud Build, Packer, and Qualys (Automated Golden Images with Vuln Scanning POC)

A few months ago, I wanted to set up a CI/CD pipeline and started with a Jenkins server. Right away, I was not a fan of the GUI and it didn’t seem like fun learning so I looked for an alternative and found the community around Cloud Build.

If you haven’t used Cloud Build before, I’m hoping that after reviewing my current proof of concept(POC), you’ll get some familiarity with it. Unlike most of my post, I’m not going to post any code this time since it’s still a work in progress (more updates later).

Getting started

Cloud Build is Google Cloud Platform (GCP)’s CI/CD service that allows you to sync a Github (or other version control) repositories to create builds (container images, virtual machine images, etc) based on the steps listed in a cloudbuild.yml file or Dockerfile. You can create triggers to automatically start a build based on a pull request or maybe only trigger for pushes to specific branches. For my most recent project, I’ve been trying to automate creating hardened golden images via Slack bot or GCP Cloud Scheduler (used runs cron jobs).

The POC I’ve been working on is below but first, there are two caveats:

  1. This is the POC using a Slack bot not Cloud Scheduler
  2. This is only for the building a golden image. The Slack bot has some other functionalities but I wanted to keep the diagram more focused.

I’m going to try to break this down to 10 steps:

  1. The user initialize the process with a slash command (/imagebot build:debian-9) in Slack. This sends over a JSON payload to a waiting Cloud Function.
  2. This Cloud Function serves as a checkpoint. It will only forward the payload to the next Cloud Function if it meets a few requirements. The initializing slash command must come from me, from within my channel and contain only a key-value pair with specific commands and operating systems. Build is one accepted command, and Debian-9 is one accepted operating system. With both of those, and the other two checks, the payload can then be passed to the next Cloud Function to start the building process.
  3. The Build command in the payload will run the Cloud Build trigger for the Debian-9 branch in my Github repository.

*Cloud Build will need to perform some configuration in order to have read permissions into the repository. This is because it will copy the repo to Cloud Source Repository (Google Cloud’s source control) and use that during build. Mine is configured to always have a 1:1 mirror of my Github content with my Cloud Source Repo content. That is the say, if I add a file to my Github repo, it is immediately cloned to the corresponding Cloud Source repo.

Here is part of my current set up tree (sorry for the bad tree example in advance— branches not directories below):

└── Github Repo
├── debian-9-branch
│ ├── cloudbuild.yml
│ ├── debian-9.json
│ ├── scripts/harden.sh
├── debian-10-branch
│ ├── cloudbuild.yml
│ ├── debian-10.json
│ ├── scripts/harden.sh
├── master branch
├── dev branch

For each branch, there is a corresponding trigger set up that can be run. This will start the Cloud Build process which will start with a Cloudbuild.yml or Dockerfile.

4. Since Cloud Build takes a .yml file, then performs the steps listed in the file, it needs a few things configured to run successful. At some point during setting this up, a made Cloud Build trigger for Packer. This means, I forked the Cloud Build community Packer repository, set up some configuration and ran the trigger in Cloud Build. This exported a packer image to Google Container Registry (GCR). I need to call this image (Packer) during the build process of create my hardened virtual machine image.

With the Packer image already ready to go, when I trigger the process, Cloud Build runs through the steps in the yaml file. Packer will create a temp instance, perform all the configurations, run scripts, build an image, and terminate everything else but the image. The image is then exported to GCR. When running the scripts on the instance, this is an excellent place to harden the host up the CIS benchmark standards.

5. A third and final Cloud Function is subscribed to the Cloud Build Topic that was created by default when the Cloud Build API is enabled (I think it gets created; I might of created it…). Any builds that have started, are currently in process, or finished will periodically post a status update to the topic during a build cycle. I have a Cloud Function that monitors for specific, successful builds.

6. If it sees the right one, it will create an instance, wait for instance to start and get the external IP of the instance using the Compute API. It will then generate a RSA public, private-key pair in memory and attach the public key to the instance using the SetInstanceMetadata API call. This makes each key, unique and a one-time use since it is created during the runtime of the Cloud Function.

7. It will then reach out to the Qualys API and add the External IP to the Qualys Host Subscription. You cannot scan a host that is not part of your subscription. It will also make an API call to add the private key as a Unix Record so the Qualys External scanners can authenticate to the instance for most accurate scan results.

If you’ve never used Qualys, it is a Software-as-a-Service platform for vulnerability and compliance scanning. Deploy one of their scanners to your network or agent to your instances, and they will report back to the Qualys platform which you log in and view the information about your hosts. This is a little different than setting up a vulnerability scanner in your network and launching a scan. Here I’m using one of Qualys many external scanners owned by them which are on standby to use by customers.

8. Right now, a firewall is created every time a build is run. This firewall allows the Qualys IP range ingress communication to hosts with the specific tags in the network. After firewall has been created and keys added, a scan is initiated.

Vulnerability scans have been taking around 4–5 minutes per host, plus less than 2 minutes to start an instance/configure firewall and keys. So that gives me enough time to use a Cloud Function (they have a timeout after 9 minutes) to wait for the scan to be complete and perform a few cleanup tasks.

9. This is currently where I’m at for this specific command (I’ve been working on some other ones) so the rest is extra extra POC at this point (I have the rest of the Go logic, it’s just not implemented right now).

10. After the scan is finished, the scan will be uploaded to GCS and an automated message pushed to Slack.

One thing I left out is the use of GCP Secret Manager. When the 3rd Cloud Function is making calls out to Qualys, it’s getting the API password from Secret Manager and the username from the environmental variable. This prevents me from hard coding any passwords.

Things I may add

  • Since these hosts are just base images and shouldn’t contain any sensitive info, I plan to export the scan results directly to Slack during completion.
  • Since all GCP instances come with gsutil built in, during the hardening scripts, I could potentially validate with Chef Inspec and upload the results to GCS.

I’m not sure how much more effort I’ll put into this but it’s been fun since I’m doing it all in Go and on GCP. Hopefully this POC explanation helps someone. If you made it to the end, thank you. Best wishes everyone and stay safe out there!

p.s. I’m still going to learn Jenkins