Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Build An Alexa Skill With Golang And AWS Lambda

TwitterFacebookRedditLinkedInHacker News

It has been a few years since I last worked on and published an application, otherwise known as a Skill, for Alexa powered voice assistants. My last Skill titled, BART Control, was built out of necessity because of my commuting on the Bay Area Rapid Transit system. While I didn’t open source it, I had created the Skill with Node.js and a publicly available BART web service. Since then I had written a tutorial titled, Create an Amazon Alexa Skill Using Node.js and AWS Lambda, which also focused on Node.js.

I’m a huge fan of Golang and was pleased to see that AWS Lambda recently started to officially support it. AWS Lambda isn’t a requirement to creating Alexa Skills, but it is a huge convenience. To make things even better, Amazon recently sent me an invitation to take part in their developer offer to receive an Amazon Echo Show for publishing another Skill. The offer and Golang inspired me to develop another Skill and this time I wanted to share my process.

Before starting development, I had to come up with an idea for the Skill. I wouldn’t want to pollute Amazon with garbage like a fart Skill in the sea of probably 1,000 or so other fart Skills. I wanted to build something that I’d actually use or for a service that I actually use.

Fast forward, I decided to build a Skill for a deal hunting site that I use called Slickdeals:

Slickdeals

My thoughts were that I’d have Alexa read me the latest deals rather than me having to navigate the website, even though the website isn’t difficult to navigate. So this made me wonder how I’d do this, more specifically because Slickdeals doesn’t have any APIs for data consumption.

Slickdeals, like many websites, have an RSS feed which is XML data. With the RSS feed, we can consume it in our Golang application, fix it up, then have Alexa present it back to the end user.

Developing an AWS Lambda Function with Golang

Before we worry about building an Alexa Skill, we need to take a step back and worry about creating an AWS Lambda function. At the end of the day, our logic will be an AWS Lambda function that returns data formatted specifically for Alexa.

Because AWS Lambda now supports Golang, we have a nice SDK available to us. To download the Lambda SDK, execute the following:

go get github.com/aws/aws-lambda-go/lambda

If this is your first time dabbling with AWS Lambda in Golang, just think of it as a fancy way to execute your Go functions. Requests to the Lambda function have a specific payload that the SDK takes care of. The SDK also takes care of the responses. You just need to worry about everything in between, just like you would every other application.

Within your $GOPATH path, create a project directory, and within that directory create a main.go file with the following boilerplate code:

package main

import (
    "encoding/xml"
    "io/ioutil"
    "net/http"
    "net/url"
    "fmt"
    "github.com/aws/aws-lambda-go/lambda"
)

func Handler() (string, error) {
    return fmt.Sprintf("Hello World"), nil
}

func main() {
    lambda.Start(Handler)
}

The above code is incredibly basic, but the good news is that our final application won’t be too much more difficult. Essentially what happens is the AWS Lambda service will run our main function which starts our Handler function. Our function returns a string which Lambda will return to the user. Not particularly useful, but at least we have a starting point.

To upload our application, there are a few steps involved. If you’re developing on Mac or Linux, you can execute the following:

GOOS=linux go build
zip handler.zip ./binary-name

The above commands will build a binary for Linux and add it to a ZIP archive. The above assumes that your application is called binary-name and that you have a zip command in your Terminal. If you’re on Windows, the steps will be similar, but use your best judgement on the changes.

With the handler.zip file in hand, we need to create a new function in the AWS Management Console. When creating a new function, give it a name, but make sure you declare it as a Go application. The defaults should be fine for everything along the way.

When you’re done with the creation wizard, you should be brought to something like the following:

Slick Dealer on AWS Lambda

For the trigger, make sure to choose Alexa Skills Kit, even though we won’t be using it quite yet. It may ask you to provide your Skill id, but you can just disable it for now. In production you’ll want it enabled so people can’t abuse your function. You’ll also want to upload your handler.zip file and specify the name of your binary.

When it is uploaded, if you test your Skill using the AWS Lambda testing features, it should return your string.

What we have is nice, but not very exciting. Now we need to focus on integrating Alexa support and give it some functionality for scraping the Slickdeals RSS feed and presenting the data.

Building a Slickdeals Amazon Alexa Skill

Like I mentioned previously, Alexa issues requests in a certain format and expects responses in a certain format. Getting started with Alexa and Golang, I came across a package by Arien Malec called alexa-go. This library essentially is a bunch of data structures that map requests and responses. They don’t provide any actual logic and in most circumstances, this package is great for your project. We’ll be using an edited version of this package, but I just wanted to give credit where credit is due.

Within your project create an alexa directory with a request.go file that contains the following:

package alexa

const (
	HelpIntent   = "AMAZON.HelpIntent"
	CancelIntent = "AMAZON.CancelIntent"
	StopIntent   = "AMAZON.StopIntent"
)

type Request struct {
	Version string  `json:"version"`
	Session Session `json:"session"`
	Body    ReqBody `json:"request"`
	Context Context `json:"context"`
}

type Session struct {
	New         bool   `json:"new"`
	SessionID   string `json:"sessionId"`
	Application struct {
		ApplicationID string `json:"applicationId"`
	} `json:"application"`
	Attributes map[string]interface{} `json:"attributes"`
	User       struct {
		UserID      string `json:"userId"`
		AccessToken string `json:"accessToken,omitempty"`
	} `json:"user"`
}

type Context struct {
	System struct {
		APIAccessToken string `json:"apiAccessToken"`
		Device         struct {
			DeviceID string `json:"deviceId,omitempty"`
		} `json:"device,omitempty"`
		Application struct {
			ApplicationID string `json:"applicationId,omitempty"`
		} `json:"application,omitempty"`
	} `json:"System,omitempty"`
}

type ReqBody struct {
	Type        string `json:"type"`
	RequestID   string `json:"requestId"`
	Timestamp   string `json:"timestamp"`
	Locale      string `json:"locale"`
	Intent      Intent `json:"intent,omitempty"`
	Reason      string `json:"reason,omitempty"`
	DialogState string `json:"dialogState,omitempty"`
}

type Intent struct {
	Name  string          `json:"name"`
	Slots map[string]Slot `json:"slots"`
}

type Slot struct {
	Name        string      `json:"name"`
	Value       string      `json:"value"`
	Resolutions Resolutions `json:"resolutions"`
}

type Resolutions struct {
	ResolutionPerAuthority []struct {
		Values []struct {
			Value struct {
				Name string `json:"name"`
				Id   string `json:"id"`
			} `json:"value"`
		} `json:"values"`
	} `json:"resolutionsPerAuthority"`
}

If you look at any request that Alexa makes, it should have a JSON structure that matches the data structures in the above code. Remember, all we’re doing is mapping the request so it can be easily used within our application. Similarly we want to create an alexa/response.go file in our project with the following code:

package alexa

import "strings"

func NewSimpleResponse(title string, text string) Response {
	r := Response{
		Version: "1.0",
		Body: ResBody{
			OutputSpeech: &Payload{
				Type: "PlainText",
				Text: text,
			},
			Card: &Payload{
				Type:    "Simple",
				Title:   title,
				Content: text,
			},
			ShouldEndSession: true,
		},
	}
	return r
}

type Response struct {
	Version           string                 `json:"version"`
	SessionAttributes map[string]interface{} `json:"sessionAttributes,omitempty"`
	Body              ResBody                `json:"response"`
}

type ResBody struct {
	OutputSpeech     *Payload     `json:"outputSpeech,omitempty"`
	Card             *Payload     `json:"card,omitempty"`
	Reprompt         *Reprompt    `json:"reprompt,omitempty"`
	Directives       []Directives `json:"directives,omitempty"`
	ShouldEndSession bool         `json:"shouldEndSession"`
}

type Reprompt struct {
	OutputSpeech Payload `json:"outputSpeech,omitempty"`
}

type Directives struct {
	Type          string         `json:"type,omitempty"`
	SlotToElicit  string         `json:"slotToElicit,omitempty"`
	UpdatedIntent *UpdatedIntent `json:"UpdatedIntent,omitempty"`
	PlayBehavior  string         `json:"playBehavior,omitempty"`
	AudioItem     struct {
		Stream struct {
			Token                string `json:"token,omitempty"`
			URL                  string `json:"url,omitempty"`
			OffsetInMilliseconds int    `json:"offsetInMilliseconds,omitempty"`
		} `json:"stream,omitempty"`
	} `json:"audioItem,omitempty"`
}

type UpdatedIntent struct {
	Name               string                 `json:"name,omitempty"`
	ConfirmationStatus string                 `json:"confirmationStatus,omitempty"`
	Slots              map[string]interface{} `json:"slots,omitempty"`
}

type Image struct {
	SmallImageURL string `json:"smallImageUrl,omitempty"`
	LargeImageURL string `json:"largeImageUrl,omitempty"`
}

type Payload struct {
	Type    string `json:"type,omitempty"`
	Title   string `json:"title,omitempty"`
	Text    string `json:"text,omitempty"`
	SSML    string `json:"ssml,omitempty"`
	Content string `json:"content,omitempty"`
	Image   Image  `json:"image,omitempty"`
}

The above request.go and response.go files get us up to speed with what Arien Malec provided in his Github repository. Again, they are more or less just mappings of the requests and responses for Alexa. We’re going to be altering the response.go file as we progress with this example.

Now let’s focus on the core logic of our Skill. Open the project’s main.go file and include the following:

func HandleFrontpageDealIntent(request alexa.Request) alexa.Response {
    return alexa.NewSimpleResponse("Frontpage Deals", "Frontpage deal data here")
}

func HandlePopularDealIntent(request alexa.Request) alexa.Response {
    return alexa.NewSimpleResponse("Popular Deals", "Popular deal data here")
}

func HandleHelpIntent(request alexa.Request) alexa.Response {
    return alexa.NewSimpleResponse("Help", "Help regarding the available commands here")
}

func HandleAboutIntent(request alexa.Request) alexa.Response {
    return alexa.NewSimpleResponse("About", "Slick Dealer was created by Nic Raboy in Tracy, California as an unofficial Slick Deals application.")
}

Above we have four different handler functions. This is sort of true in the sense that we have four different functions, but we are still only going to have a single Lambda function. We’re going to route to each of these functions pending the data in the Alexa request. However, notice in the four functions that we are creating a new simple response with a string title and text to be read.

To route to each of these functions, we’re going to take a step back and design another function:

func IntentDispatcher(request alexa.Request) alexa.Response {
	var response alexa.Response
	switch request.Body.Intent.Name {
	case "FrontpageDealIntent":
		response = HandleFrontpageDealIntent(request)
	case "PopularDealIntent":
		response = HandlePopularDealIntent(request)
	case alexa.HelpIntent:
		response = HandleHelpIntent(request)
	case "AboutIntent":
		response = HandleAboutIntent(request)
	default:
		response = HandleAboutIntent(request)
	}
	return response
}

With a dispatcher type function, we can look at the intent that was sent in the Alexa request. Depending on the intent that Alexa sends, we’re going to route to the appropriate function. We need to take another step back to wire everything together. We need to look at our original Handler function:

func Handler(request alexa.Request) (alexa.Response, error) {
	return IntentDispatcher(request), nil
}

Instead of returning a string, we’re returning an alexa.Response as designed in our response.go file, but we’re also calling our IntentDispatcher function. So the flow of events is that Alexa will call our Lambda function with a request payload. The payload will be passed to our dispatcher and the appropriate function will be called based on the payload information.

Alright, so the HandleAboutIntent function looks good, but what about the other three? Let’s first look at the simplest, being the HandleHelpIntent function:

func HandleHelpIntent(request alexa.Request) alexa.Response {
    responseText := ""
	responseText += "Here are some of the things you can ask:
	responseText += "Give me the frontpage deals.
	responseText += "Give me the popular deals."
	return alexa.NewSimpleResponse("Slick Dealer Help", responseText)
}

In the above code, we are constructing a string and returning it for Alexa to read. If you did this, you might notice that Alexa doesn’t add much pause between sentences. You could had 100 more periods or commas, but it would still be a little funky. Instead, we should take advantage of Speech Synthesis Markup Language (SSML) where we can add meta information for Alexa to work off of.

To do this, we need to edit our response.go file. We need to include the following:

func NewSSMLResponse(title string, text string) Response {
	r := Response{
		Version: "1.0",
		Body: ResBody{
			OutputSpeech: &Payload{
				Type: "SSML",
				SSML: text,
			},
			ShouldEndSession: true,
		},
	}
	return r
}

type SSML struct {
	text  string
	pause string
}

type SSMLBuilder struct {
	SSML []SSML
}

func ParseString(text string) string {
	text = strings.ToLower(text)
	text = strings.Replace(text, "&", "and", -1)
	text = strings.Replace(text, "+", "plus", -1)
	text = strings.Replace(text, "@", "at", -1)
	text = strings.Replace(text, "w/", "with", -1)
	text = strings.Replace(text, "in.", "inches", -1)
	text = strings.Replace(text, "s/h", "shipping and handling", -1)
	text = strings.Replace(text, " ac ", " after coupon ", -1)
	text = strings.Replace(text, "fs", "free shipping", -1)
	text = strings.Replace(text, "f/s", "free shipping", -1)
	text = strings.Replace(text, "-", "", -1)
	text = strings.Replace(text, "™", "", -1)
	text = strings.Replace(text, "  ", " ", -1)
	return text
}

func (builder *SSMLBuilder) Say(text string) {
	text = ParseString(text)
	builder.SSML = append(builder.SSML, SSML{text: text})
}

func (builder *SSMLBuilder) Pause(pause string) {
	builder.SSML = append(builder.SSML, SSML{pause: pause})
}

func (builder *SSMLBuilder) Build() string {
	var response string
	for index, ssml := range builder.SSML {
		if ssml.text != "" {
			response += ssml.text + " "
		} else if ssml.pause != "" && index != len(builder.SSML)-1 {
			response += "<break time='" + ssml.pause + "ms'/> "
		}
	}
	return "<speak>" + response + "</speak>"
}

So what is happening in each of these functions or data structures? Well first, SSML makes use of XML tags to handle changes in speech. For this reason, we should probably create some kind of SSML builder. Our SSML will be much simpler than it could be so we’re only going to keep track of text and pauses, hence the SSML data structure and the SSMLBuilder data structure. SSML is very picky in what it can contain in the text. When our text has special characters, which RSS feeds often do, Alexa will crash because it won’t understand how to speak them as SSML. Instead we need to do our best to remove those special characters.

The ParseString function is my attempt at removing some of the bad characters or abbreviations that are found in the Slickdeals RSS feed. Every time we specify that we want Alexa to say something, we call the Say function and add the parsed text to the slice. Likewise, every time we want Alexa to pause in speech, we call the Pause function to add an XML pause in milliseconds. Building the SSML is the important step. We have our data in a slice, but we need to build it into a string. To build our SSML we can loop through the slice and concatenate everything in order, including the pauses. In the end, we wrap the string in <speak> XML tags.

SSML can get far more complex than my example. It can also be handled much better than I did it, but at least it will get you thinking.

Now heading back into our intent functions. If we look at the HandleHelpIntent, we can adjust it to the following:

func HandleHelpIntent(request alexa.Request) alexa.Response {
	var builder alexa.SSMLBuilder
	builder.Say("Here are some of the things you can ask:")
	builder.Pause("1000")
	builder.Say("Give me the frontpage deals.")
	builder.Pause("1000")
	builder.Say("Give me the popular deals.")
	return alexa.NewSSMLResponse("Slick Dealer Help", builder.Build())
}

In the above function, we’re adding our text, but we’re saying that there should be a one second pause between text. This will make things sound a little more natural.

The final two intent functions have a dependency on the remote data. Before we start scraping the RSS feed, we need to map it to a native Go data structure. In the main.go file, include the following:

type FeedResponse struct {
	Channel struct {
		Item []struct {
			Title string `xml:"title"`
			Link  string `xml:"link"`
		} `xml:"item"`
	} `xml:"channel"`
}

With the mapping in place, we can construct a function that we can reuse to make requests:

func RequestFeed(mode string) (FeedResponse, error) {
	endpoint, _ := url.Parse("https://slickdeals.net/newsearch.php")
	queryParams := endpoint.Query()
	queryParams.Set("mode", mode)
	queryParams.Set("searcharea", "deals")
	queryParams.Set("searchin", "first")
	queryParams.Set("rss", "1")
	endpoint.RawQuery = queryParams.Encode()
	response, err := http.Get(endpoint.String())
	if err != nil {
		return FeedResponse{}, err
	} else {
		data, _ := ioutil.ReadAll(response.Body)
		var feedResponse FeedResponse
		xml.Unmarshal(data, &feedResponse)
		return feedResponse, nil
	}
}

The above RequestFeed function will return the data as a FeedResponse. Essentially, we’re constructing a request based on what the Slickdeals RSS feed expects. After we make the request, we can marshal it into our object to be returned. At this point, we just need to call our RequestFeed function.

Alter the HandleFrontpageDealIntent to look like the following:

func HandleFrontpageDealIntent(request alexa.Request) alexa.Response {
	feedResponse, _ := RequestFeed("frontpage")
	var builder alexa.SSMLBuilder
	builder.Say("Here are the current frontpage deals:")
	builder.Pause("1000")
	for _, item := range feedResponse.Channel.Item {
		builder.Say(item.Title)
		builder.Pause("1000")
	}
	return alexa.NewSSMLResponse("Frontpage Deals", builder.Build())
}

In the above intent, we are making a request for frontpage data. With the data, we can loop through the feed and create SSML with our builder. If you’re new to Slickdeals, there are a lot of records on every page, so having no pause between deals would be a nightmare.

Similarly, we can create the following HandlePopularDealIntent function:

func HandlePopularDealIntent(request alexa.Request) alexa.Response {
	feedResponse, _ := RequestFeed("popdeals")
	var builder alexa.SSMLBuilder
	builder.Say("Here are the current popular deals:")
	builder.Pause("1000")
	for _, item := range feedResponse.Channel.Item {
		builder.Say(item.Title)
		builder.Pause("1000")
	}
	return alexa.NewSSMLResponse("Popular Deals", builder.Build())
}

The setup to the above function is the same, but our RequestFeed is receiving a different value which represents the different type of deals.

At this point, our simple Skill is ready to be configured in the Amazon Developer Portal.

Configuring an Alexa Skill for Deployment

Amazon has a separate portal for Alexa developers versus users of Amazon Web Services. Go to the Amazon Developer Portal and create a new Alexa Skill. You’ll want to choose the defaults as we’re not creating a smart home skill or anything fancy.

After naming your Skill, you’ll be brought to the main dashboard:

Alexa Developer Dashboard

The dashboard has a checklist of all the things that must be met before you can publish your Skill to the general public. The first step in the checklist is to pick an invocation name. Take it from someone who has done this before. Pick a name that is easy to understand, otherwise Alexa will never know what you’re talking about. If you pick an invocation name like “McFlibbets”, good luck to you.

After you have an invocation name, you need to assign your intents to utterances. An intent should be created for every intent you have in your Go application and the intents should have exactly the same name. For us, we had AboutIntent, FrontpageDealIntent, and PopularDealIntent. The HelpIntent used a reserved AMAZON.HelpIntent.

For each intent, we need to define sample utterances which are phrases to activate the function. For example, we might have the following for the FrontpageDealIntent:

get the latest frontpage deals
what are the frontpage deals
give me the frontpage deals

Above are just three examples of many. The more sample utterances you have, the better your Skill will behave. Think about every possible phrase a user might try to use to use your Skill. They should all be included for each intent.

Skipping to the fourth step, we need to link the Skill to the AWS Lambda function. In the Lamda dashboard you should see the id. Copy it and paste it into the Alexa dashboard. Likewise, you can take the Alexa id and paste it in your Lambda dashboard for extra security.

Assuming everything went well, you can test your Skill using the simulator and the choose to publish.

Conclusion

You just saw how to create an Amazon Alexa Skill with Golang that is hosted on AWS Lambda. Essentially, we created a Lambda function that parses the Slickdeals RSS feed and returns Alexa formatted data. This tutorial can be expanded beyond Slickdeals because all we really did was parse an RSS feed.

There are perks to creating Alexa Skills, like I mentioned previously. I submitted this Skill to obtain an Amazon Echo Show, but my previous Skill got me an Echo Dot as well as $100.00 per month in AWS credit that I’ve been receiving for almost two years now.

Nic Raboy

Nic Raboy

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.