Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Writing Self Hosted Alexa Skills With Golang

TwitterFacebookRedditLinkedInHacker News

A couple of years ago I was lucky enough to win an Echo Dot in a company hackathon. Since then I have been trying to develop Alexa Skills that interest me in my spare time. Before exploring this new field of development, I had been interested in learning and practicing a language that was new to me, Golang (or just Go). Considering Alexa skills are based on web services, one of the area where Go excels, it seemed like a great way to “have my cake and eat it too.”

It was a couple of months ago when I came across a great post by Nic Raboy on writing about writing Alexa Skills with Golang and AWS Lambda which can be found here. Most of the Skills I have developed started before Lambda had first-class support for Go so I am much more comfortable writing Skills using self-hosted web services. Using Lambda for Alexa Skills is definitely a great approach but there are some instances where using your own server might make more sense. If you are looking to reuse an existing server or rapidly prototype an idea then maybe it makes more sense to use this approach.

In this post, I will detail the steps necessary to deploy a web service that can be used to fulfill Alexa Skill requests. To make it easier to compare this approach with using Lambda, the functionality of the Skill will remain almost identical to Nic Raboy’s example. It is only the deployment process that will be changed.

Alexa Developer Console Setup

The first step in the process is to define the skill in the Alexa Developer Console. After signing in to the console with your Amazon developer account, you should be presented with a blank page and a “Create Skill” button. Clicking the button will present a new page where the Skill name, default language and skill model can be specified. This guide won’t be using a pre-built model so you can just select the custom option. The next page will provide a way to select a template. This won’t be used here either so “Start from scratch” is the right choice. Once this is complete, the main skill “Build” page is shown. It should look something like this:

Skill Builder Checklist

On the right side of the page is a helpful checklist that shows all of the steps that need to be completed. Note all but one of the items in the checklist shown above are required. Configuring In-Skill Purchasing is an optional step at the time of this writing. At this point the Invocation Name step is marked as complete. The invocation name is inferred from the skill name provided in the previous step. If you would like to provide a different invocation name, either click that item in the checklist or select “Invocation” from the menu on the left. This string can be changed up until your skill passes certification.

Defining the Intents

The next step of the process is to define the intents that will be handled by the skill. To get started, click the “Add” button next to “Intents ()” in the left menu. Fill in a name for the new intent click the create button. At this point the sample utterances can be added. Try to define as many sample utterances as possible. Defining all of the possible ways you can think of a user invoking a specific intent will increase the reliability of Alexa mapping what a user says to the expected intent. This intent definition view is also where you can define any “slots” that may be used in your intents. Slots are a way for the Alexa service to provide variable values to your web service based on what value a user specifies. There are a wide range of built in slot types that can be used out of the box as well as a way to define your own custom types that you would like to use. For more information about defining intent and slot for your interaction model, refer to the documentation on this page.

Similar to the Slick Dealer skill running in Alexa, there will be three three custom intents used. The first one to define will be the FrontpageDealIntent intent. For the name, enter FrontpageDealIntent. There are a lot of different ways that a user could request the front page deals. Here are a few sample utterances that can be used but feel free to add more:

get the latest front page deals
what are the front page deals
give me the front page deals
give me the frontpage deals
what are the frontpage deals

Now would be a good time to save the model defined so far and move on to defining the rest of the intents. The other intents that will be used in this guide are the PopularDealIntent and the AboutIntent. When an intent without any slots is fully defined it should look something like this:

Intent Definition

Once all of the intents are fully defined it is time to save the model one more time and then click the “Build Model” button. The build process could take some time but a pop-up will be shown on the page when the build completes. Assuming there are no unexpected errors building your model, the Skill builder checklist should now show the first three steps as complete.

Endpoint and SSL

Up to this point things have been identical to the Lambda approach for building a custom Skill. This is where the two paths start to diverge. Clicking “Endpoint” from the left menu will send you to a page for you to define the endpoint to handle the requests sent to your custom Skill. In the previous post, AWS Lambda ARN was used. In this guide, HTTPS will be used instead. Choosing HTTPS adds a few more options for defining endpoints to be used for specific regions. In our case, a single default region will be used. The hosting for the web service has not been setup yet so for this step a temporary placeholder can be used. Enter a URI like https://my-domain-placeholder.com/echo/slickdeals for now. Below the URI is a for selecting the type of SSL certificate that will be used with your endpoint. There is an option to upload a self-signed certificate in x509 format if that is route you would like to take. Another option is to get a valid certificate for a domain that you own from a trusted certificate authority (I personally prefer Let’s Encrypt). For this example we will be using Heroku for hosting our web service so the option to choose would be:

My development endpoint is a sub-domain of a domain that has a valid wildcard certificate from a certificate authority

This will work because Heroku has a valid certificate for all *.herokuapps.com domains so SSL will be automatically handled for us. More details on this will be provided later when we get to the deployment steps. Don’t forget to click “Save Endpoints” after filling in these details.

There are a number of requirements that our web service will need to enforce to be able to pass Skill certification. The key requirements include:

  • The web service must provide a valid HTTPS endpoint and accept requests on port 443
  • Verify that the request was sent by Alexa
  • Check the signature of the request
  • Check the timestamp of the request

The reference documentation for the web service requirements can be found on this page. Luckily for us, the HTTPS requirements will be handled entirely by Heroku. The last three requirements will be handled by the go-alexa library that will be used in the code portion of this guide. More details on this in the coding section.

At this point the first all the required steps in the Skill builder checklist are complete and it is time to move on to configuring the web service.

Heroku Console Setup

Heroku is a popular Platform as a Service (PaaS) product that lets you deploy your web service as easily as pushing your code to a special git remote. For information about getting started with Go on Heroku, refer to their language support page. The free tier offered by Heroku is a great way to quickly implement a free prototype of an Alexa Skill.

Start by creating an account and a new app in the Heroku dashboard. Once the app is created, the Deploy tab will have instructions for deploying a project using this app’s configuration. We will come back to this later when it is time to push the first version of the code. At this point we can go back and complete the HTTPS endpoint configuration in the Alexa developer console that we skipped earlier. Navigate to the Settings tab and look under the “Domains and certificates” section. The endpoint I have configured for my app is:

https://slick-deals.herokuapp.com/echo/slickdeals

Note that the newly created app is hosted with a sub-domain of the herokuapp.com domain as mentioned earlier when specifying the SSL certificate settings in the developer console. It is possible to provide a custom domain if you would like. This could useful later if you choose to all serve a web page with information about your skill or possibly redirect from a custom domain to your Skill’s page on amazon.com. For now, using the provided domain is enough to proceed. The /echo/slickdeals path is optional but makes it easier to group requests to the same server if necessary.

Alexa Skill Code

Now is the fun part, implementing the Go code that will actually respond to the Alexa Skill intents. The package that we will use for marshaling Alexa request and response types as well as validating requests is: github.com/mikeflynn/go-alexa/skillserver. The first step is to define the application configuration that will be used when the skillserver is run. The skillserver package uses a custom EchoApplication type for defining a handler for a specific request path. The application configuration provides separate properties for defining individual handlers for the different Alexa Skill Kit request types. Here is a short description of each request type:

  • LaunchRequest: the user opens the skill but does not invoke a specific intent
  • IntentRequest: the user invokes your skill with a command mapping to one of your intents
  • SessionEndedRequest: the Alexa service is letting your web service know that the session is now ended

More about these different request types can be found here. Here is the EchoApplication configuration for the Slick Dealer skill.

var (
	slickDealsAppID = os.Getenv("SLICK_DEALS_APP_ID")
	applications    = map[string]interface{}{
		"/echo/slickdeals": skillserver.EchoApplication{
			AppID:          slickDealsAppID,
			OnIntent:       intentHandler,
			OnLaunch:       launchHandler,
			OnSessionEnded: sessionEndedHandler,
		},
	}
)

So what is really happening here? The first step is to get the Alexa SKill ID. This value is provided in the Alexa Developer Console. Just click “View Skill ID” under your skill’s name. This value is used to verify the requests being sent to your service are originating from the correct skill. If the app ID sent in the request does not match the one provided here, the skillserver package will return an error status automatically. This behavior is also required for Skill certification. The next step is to define a mapping between endpoints that should be accepted by your service and the application configurations that should be used for that endpoint. The keen observer will noticed that the mapping uses the empty interface type rather than the concrete skillserver.EchoApplication. There is a good reason for this. The skillserver package also provides a StdApplication type that can be used to add standard HTTP handlers for a specific path. This could be useful if you would like to perform request validation on your own or if you would like to accept standard HTTP requests. Serving a privacy policy page could be handled at a different path in the same Go binary. Deploying a single Go binary instead of multiple servers for the different pieces could come in handy when trying to quickly deploy a low-cost Skill.

Now it is time to actually define these handlers that we are trying to use. For the time being they can just be stubbed out with something like this:

func launchHandler(request *skillserver.EchoRequest, echoResponse *skillserver.EchoResponse) {
	echoResponse.OutputSpeech("You have successfully launched a new session.")
	echoResponse.EndSession(false)
}

func sessionEndedHandler(request *skillserver.EchoRequest, echoResponse *skillserver.EchoResponse) {
	echoResponse.OutputSpeech("Session ended.")
}

func intentHandler(request *skillserver.EchoRequest, echoResponse *skillserver.EchoResponse) {
	echoResponse.OutputSpeech(fmt.Sprintf("You have invoked the %s intent.", request.GetIntentName()))
}

These handlers should look very familiar to the standard request handlers provided in the net/http package. The difference here is that the skillserver is handling the type transformation for the requests and responses to these custom types. The OutputSpeech method can be used to provide a string that should be spoken back to the user. One more key piece to note is in the launchHandler function. The EndSession flag needs to be explicitly set to false here. By default the flag is set to true and the session is closed.

With the application configured and the handlers defined the only thing left to do is actually start the server. Here is a minimal main definition to get the server up and running:

func main() {

	port := os.Getenv("PORT")

	skillserver.Run(applications, port)
}

The skillserver package will handle setting up all of the handlers with their respective paths and start listening on the provided port. Dynamically loading the port number like this is required to deploy a web service to Heroku. A port will be dynamically provided in the environment when Heroku starts your application. If you decide to deploy your application to a server where you need to manage the SSL configuration yourself, you will need to use the following function to start your server:

func RunSSL(apps map[string]interface{}, port, cert, key string)

In less than 50 lines of code you have a minimal application that can handle requests from the Alexa service and send back (somewhat boring) text to speech responses. It is worth noting that by using Heroku and the go-alexa package, our code doesn’t need to know anything about request validation or HTTPS. With the basic structure of the application in place, it is time to define functions that will handle the individual intents that were setup earlier in the Alexa console. If you’ve already gone through the Lambda walkthrough, these function should look very similar. Here is the function for the AboutIntent:

func handleAboutIntent() *skillserver.EchoResponse {

	response := skillserver.NewEchoResponse()
	response.OutputSpeech("Slick Dealer was created by Rob in New Hampshire as an unofficial Slick Deals application.")
	response.SimpleCard("About", "Slick Dealer was created by Rob in New Hampshire as an unofficial Slick Deals application.")

	return response
}

The key difference here is that the skillserver package requires separate steps for providing the text-to-speech string and the card that is shown in the Alexa companion app. Next is the built-in AMAZON.HelpIntent:

func handleHelpIntent() *skillserver.EchoResponse {

	response := skillserver.NewEchoResponse()
	builder := skillserver.NewSSMLTextBuilder()

	builder.AppendSentence("Here are some things you can ask: ")
	builder.AppendSentence("Give me the frontpage deals.")
	builder.AppendSentence("Give me the popular deals.")

	return response.OutputSpeechSSML(builder.Build())
}

The skillserver package uses a similar builder pattern for constructing the Speech Synthesis Markup Language (SSML) response as the Lambda example. The AppendSentence method in this example is used to add a short pause between each string. For a refresher on SSML functionality supported by Alexa, take a look at their SSML reference documentation. Next up are the functions for handling the FrontpageIntent and PopularDealIntent:

func handleFrontPageDealIntent() *skillserver.EchoResponse {

	feedResponse, _ := requestFeed("frontpage")
	builder := skillserver.NewSSMLTextBuilder()
	cardBody := strings.Builder{}

	builder.AppendSentence("Here are the current frontpage deals:")
	cardBody.WriteString("Here are the current frontpage deals:")
	for _, item := range feedResponse.Channel.Item[:3] {
		builder.AppendSentence(item.Title)
		cardBody.WriteString(item.Title)
	}

	response := skillserver.NewEchoResponse()
	response.OutputSpeechSSML(builder.Build())
	response.SimpleCard("Frontpage Deals", cardBody.String())
	return response
}

func handlePopularDealIntent() *skillserver.EchoResponse {

	feedResponse, _ := requestFeed("popdeals")
	builder := skillserver.NewSSMLTextBuilder()
	cardBody := strings.Builder{}

	builder.AppendSentence("Here are the current popular deals:")
	cardBody.WriteString("Here are the current popular deals:")
	for _, item := range feedResponse.Channel.Item[:3] {
		builder.AppendSentence(item.Title)
		cardBody.WriteString(item.Title)
	}

	response := skillserver.NewEchoResponse()
	response.OutputSpeechSSML(builder.Build())
	response.SimpleCard("Popular Deals", cardBody.String())
	return response
}

These two functions are very similar. The first step is to request the correct feed from Slick Deals. Then iterate over the items in the response and build an SSML response as well as the body of the card. Finally, a new response is configured using the SSML string and a card. Both of these functions have a dependency on the requestFeed function that is identical to the old one:

func requestFeed(mode string) (*feedResponse, error) {

	endpoint, _ := url.Parse("https://slickdeals.net/newsearch.php")
	queryParams := endpoint.Query()
	queryParams.Set("mode", mode)
	queryParams.Set("searcharea", "deals")
	queryParams.Set("searchin", "first")
	queryParams.Set("rss", "1")

	endpoint.RawQuery = queryParams.Encode()
	response, err := http.Get(endpoint.String())
	if err != nil {
		return nil, err
	}

	data, _ := ioutil.ReadAll(response.Body)
	feed := &feedResponse{}
	xml.Unmarshal(data, &feed)

	return feed, nil
}

With all of these functions now defined to handle each of the intents, it is time to update intentHandler to actually use this new code. Here is what it looks like now:

func intentHandler(request *skillserver.EchoRequest, echoResponse *skillserver.EchoResponse) {

	var response *skillserver.EchoResponse

	switch request.GetIntentName() {
	case frontpageDealIntent:
		response = handleFrontPageDealIntent()
	case popularDealIntent:
		response = handlePopularDealIntent()
	case helpIntent:
		response = handleHelpIntent()
	case aboutIntent:
		fallthrough
	default:
		response = handleAboutIntent()
	}

	if response == nil {
		response = skillserver.NewEchoResponse()
		response.OutputSpeech("Sorry, something went wrong loading the deals. Please try again later.")
	}

	*echoResponse = *response
}

The intent name is loaded from the EchoRequest and used to determine which one of the helper functions should be called. The returned response is then copied into the EchoResponse that the skillserver package expects to be filled.

Heroku Deployment

At this point we now have the three required pieces to deploy a great Alexa Skill. All that is left is to actually perform a deployment. The first step to prepare the app for deployment is to vendor any dependencies that were included. When a project is pushed to Heroku, it is the source that is being sent. To build the project, Heroku will need access to the required dependencies. Since vendoring dependencies in Go is a topic all on its own, I’ll refer you to the Supported Dependency/Vendor Managers Heroku page. For this example, I’ve chosen to use Godep, mostly because it was the first in the list. The following commands can be used to download the godep command, save any dependencies and metadata, and commit those additions on the current branch.

go get -u github.com/tools/godep

# Creates Godep/Godep.json and copies dependencies into the vendor/ directory
godep save ./...

git add -A .
git commit -m "Vendoring dependencies"

More information about these commands as well as information about adding and updating more dependencies later can be found in this Godep guide provided by Heroku. After committing all of the dependencies, the next step is to add the git remote for our Heroku application. In your Heroku app’s dashboard there will be a Settings tab. Navigate to that and under the Info section there will be a “Heroku Git URL” value. Add this remote to your local git configuration with this command:

git remote add heroku <HEROKU_GIT_URL>

With the remote configured, it is now time to push this code to Heroku for it to be built and executed. This command will send the code to Heroku to create a new deployment:

git push heroku master

It is difficult to properly construct an Alexa request by hand so I came up with a little trick to test that my app is deployed and running correctly. Adding a skillserver.StdApplication to an endpoint in the web service to just return a basic string will provide a endpoint that can be used to check the health of your service. Something like this will do the trick:

"/health": skillserver.StdApplication{
    Methods: "GET",
	Handler: func(w http.ResponseWriter, r *http.Request) {
	    w.Write([]byte("Ok!"))
	},
},

If you have the Heroku CLI tools installed (available via Homebrew on MacOS), you can run the heroku logs -t command to monitor your application’s logs. Navigating to the app’s URL with the /health request path should show a blank page with the ‘Ok!’ string and a line like this in the logs:

2018-12-08T04:16:24.679839+00:00 app[web.1]: [negroni] 2018-12-08T04:16:24Z | 200 | 	 123.793µs | slick-deals.herokuapp.com | GET /health

Conclusion

That’s all there is to it. This guide provides a straight forward way to begin developing an Alexa Skill using Golang and Heroku for free. There are a lot of different ways to make improvements from this base implementation. More performant Heroku dynos can be used for a price or a different cloud provider, like Digital Ocean, could be used if more customization is necessary.

Be sure to check the Alexa Certification requirements to verify that your skill meets all of the requirements before submitting for certification. Keep an eye out for any promotions offered by Amazon. They often give AWS credits or devices for newly deployed skills. Have fun and good luck!

Rob King

Rob King

Rob King is a full-time Infrastructure Engineer at Lose It! and part-time Gopher. His hobby projects include mobile applications and virtual assistant technologies. Outside of work he enjoys camping, snowboarding and traveling.