Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Analyze Stack Overflow Data With Golang And HTTP

TwitterFacebookRedditLinkedInHacker News

I was recently tasked with a project where I needed to gather data from Stack Overflow so it could be easily evaluated without having to dig around the website. Stack Exchange has many REST APIs available, some of which that don’t even need tokens or authentication, so it came down to how I wanted to consume this data.

In this tutorial, we’re going to see how to consume question and comment data from the Stack Exchange API using Golang and then export it to comma separated value (CSV) for further evaluation.

For clarity on what we’re going to accomplish, we’re going to get all unanswered questions for a given tag and we’re going to count the comments for each of those particular questions. This makes it easy to see what questions need attention at a glance.

Creating Go Data Structures for Mapping Stack Overflow Responses

Because we’re going to be working with API data, we need to first map the responses of the API, otherwise it will be complicated to work with the data. While we’re going to be using three different API endpoints, two of them share the same response format. We’ll be looking at the following:

If you haven’t already, create a project somewhere in your $GOPATH and create a main.go file. Per the Stack Exchange documentation, we can model the response data for questions like the following:

type QuestionsResponse struct {
	Items []struct {
		Tags  []string `json:"tags"`
		Owner struct {
			Reputation  int    `json:"reputation"`
			UserId      int64  `json:"user_id"`
			UserType    string `json:"user_type"`
			DisplayName string `json:"display_name"`
			Link        string `json:"link"`
		} `json:"owner"`
		IsAnswered       bool   `json:"is_answered"`
		ViewCount        int    `json:"view_count"`
		AnswerCount      int    `json:"answer_count"`
		Score            int    `json:"score"`
		LastActivityDate int64  `json:"last_activity_date"`
		CreationDate     int64  `json:"creation_date"`
		LastEditDate     int64  `json:"last_edit_date"`
		QuestionId       int64  `json:"question_id"`
		Link             string `json:"link"`
		Title            string `json:"title"`
	} `json:"items"`
	HasMore        bool `json:"has_more"`
	QuotaMax       int  `json:"quota_max"`
	QuotaRemaining int  `json:"quota_remaining"`
}

The JSON annotations in the model above match the JSON in the response. Using the power of Golang and these annotations, the response can be easily mapped to one of these objects.

For the comment data, we can create a data structure like the following:

type CommentsResponse struct {
	Items []struct {
		Owner struct {
			Reputation  int    `json:"reputation"`
			UserId      int64  `json:"user_id"`
			UserType    string `json:"user_type"`
			DisplayName string `json:"display_name"`
			Link        string `json:"link"`
		} `json:"owner"`
		ReplyToUser struct {
			Reputation  int    `json:"reputation"`
			UserId      int64  `json:"user_id"`
			UserType    string `json:"user_type"`
			AcceptRate  int    `json:"accept_rate"`
			DisplayName string `json:"display_name"`
			Link        string `json:"Link"`
		} `json:"reply_to_user"`
		Edited       bool  `json:"edited"`
		Score        int   `json:"score"`
		CreationDate int64 `json:"creation_date"`
		PostId       int64 `json:"post_id"`
		CommentId    int64 `json:"comment_id"`
	} `json:"items"`
}

As of right now we haven’t put any focus into the driving logic of this application. The focus thus far has been how we’re going to translate the API responses into something we can work with and parse within the application.

While not specifically related to the data we wish to work with, we need one more data structure:

type ErrorResponse struct {
	ErrorId      int    `json:"error_id"`
	ErrorMessage string `json:"error_message"`
	ErrorName    string `json:"error_name"`
}

There may be scenarios where the Stack Exchange API returns an error. For example, if you’ve exceeded the rate limit for the day, the API will start to return errors that are formatted like the above data structure. We just want to catch these errors rather than sit in the dark.

Making Requests to the Stack Overflow API with HTTP

Now that we know the response format, we can focus on making requests to the API. Since this is a standard REST API, we can construct and execute our requests in a rather simplistic fashion.

Take the following GetUnansweredQuestions function:

func GetUnansweredQuestions(tag string, page int) (QuestionsResponse, error) {
	endpoint, _ := url.Parse("https://api.stackexchange.com/2.2/questions/unanswered")
	queryParams := endpoint.Query()
	queryParams.Set("tagged", tag)
	queryParams.Set("site", "stackoverflow")
	queryParams.Set("page", strconv.Itoa(page))
	queryParams.Set("pagesize", "100")
	endpoint.RawQuery = queryParams.Encode()
	response, err := http.Get(endpoint.String())
	if err != nil {
		return QuestionsResponse{}, err
	} else {
		data, err := ioutil.ReadAll(response.Body)
		if err != nil {
			return QuestionsResponse{}, err
		}
		var errorResponse ErrorResponse
		json.Unmarshal(data, &errorResponse)
		if errorResponse != (ErrorResponse{}) {
			return QuestionsResponse{}, errors.New(errorResponse.ErrorName + ": " + errorResponse.ErrorMessage)
		}
		var questions QuestionsResponse
		json.Unmarshal(data, &questions)
		if questions.HasMore {
			recursiveQuestions, err := GetNoAnswerQuestions(tag, page+1)
			if err != nil {
				return QuestionsResponse{}, err
			}
			questions.Items = append(questions.Items, recursiveQuestions.Items...)
		}
		return questions, nil
	}
}

The endpoint has expectations on the type of data that comes in. For example, we need to specify that we want data from Stack Overflow and not one of the other Stack Exchange networks. We also want to specify what tag to search for and the current page of data. There are limits to the responses, so we need to paginate through the data to get it all. The function itself accepts a page which allows us to recursively call it while stepping through the pages.

After executing the request, we check for errors, both in the request and in the body. If there are no errors, we unmarshal the data and check to see if we’re on the last page via the data we already have. If we’re not on the last page, we increase the page and call the function again. Each result of the function is appended to the previous.

The GetNoAnswerQuestions function is more or less the same, but to a different endpoint:

func GetNoAnswerQuestions(tag string, page int) (QuestionsResponse, error) {
	endpoint, _ := url.Parse("https://api.stackexchange.com/2.2/questions/no-answers")
	queryParams := endpoint.Query()
	queryParams.Set("tagged", tag)
	queryParams.Set("site", "stackoverflow")
	queryParams.Set("page", strconv.Itoa(page))
	queryParams.Set("pagesize", "100")
	endpoint.RawQuery = queryParams.Encode()
	response, err := http.Get(endpoint.String())
	if err != nil {
		return QuestionsResponse{}, err
	} else {
		data, err := ioutil.ReadAll(response.Body)
		if err != nil {
			return QuestionsResponse{}, err
		}
		var errorResponse ErrorResponse
		json.Unmarshal(data, &errorResponse)
		if errorResponse != (ErrorResponse{}) {
			return QuestionsResponse{}, errors.New(errorResponse.ErrorName + ": " + errorResponse.ErrorMessage)
		}
		var questions QuestionsResponse
		json.Unmarshal(data, &questions)
		if questions.HasMore {
			recursiveQuestions, err := GetNoAnswerQuestions(tag, page+1)
			if err != nil {
				return QuestionsResponse{}, err
			}
			questions.Items = append(questions.Items, recursiveQuestions.Items...)
		}
		return questions, nil
	}
}

If we wanted to, we could probably combine the two functions and add some conditional logic to see if we wanted unanswered questions versus no answer questions. Where things get different is in the GetComments function:

func GetComments(questionId int64) (CommentsResponse, error) {
	endpoint, _ := url.Parse("https://api.stackexchange.com/2.2/questions/" + strconv.FormatInt(questionId, 10) + "/comments")
	queryParams := endpoint.Query()
	queryParams.Set("site", "stackoverflow")
	queryParams.Set("pagesize", "100")
	endpoint.RawQuery = queryParams.Encode()
	response, err := http.Get(endpoint.String())
	if err != nil {
		return CommentsResponse{}, err
	} else {
		data, err := ioutil.ReadAll(response.Body)
		if err != nil {
			return CommentsResponse{}, err
		}
		var comments CommentsResponse
		json.Unmarshal(data, &comments)
		return comments, nil
	}
}

With the GetComments function, we have a different endpoint with different query parameter expectations. While we do have to worry about the page and page size, I’ve chosen not to. I think it’d be a rare scenario to find any question with more than 100 comments. While it could happen, we’re not going to think too hard on it.

Beyond the endpoint and query parameter differences, the setup is more or less the same. We construct a request, we execute the request, and we return the response. That is the beauty of working with REST APIs and HTTP.

Parsing and Exporting Responses to a CSV for Analysis

So we’ve got the response models and the requests to the API. To finish things off, we need to bring everything together and parse the data into something that makes sense as a comma separated value (CSV) file.

To get utility out of this application, it probably isn’t a good idea to hard-code our data needs. Instead, we’re going to let the user provide data via command line flags when running the application.

func main() {
	output := flag.String("output", "output.csv", "CSV file path for Stack Overflow output")
	tag := flag.String("tag", "nativescript", "Tag to search Stack Overflow for")
	flag.Parse()
	fmt.Println("Starting the application...")
}

The above code says that we are expecting two possible flags, both containing default values in case they are not provided. One flag is an output path for the CSV file and the other is the tag we want to search for.

Continuing in the main function, the next step is to create a file for exporting to and making requests to the API:

csvFile, _ := os.Create(*output)
unansweredQuestions, err := GetUnansweredQuestions(*tag, 1)
if err != nil {
	fmt.Println(err)
	return
}
fmt.Println("Unanswered questions found: " + strconv.Itoa(len(unansweredQuestions.Items)))
noAnswerQuestions, err := GetNoAnswerQuestions(*tag, 1)
if err != nil {
	fmt.Println(err)
	return
}
fmt.Println("No answer questions found: " + strconv.Itoa(len(noAnswerQuestions.Items)))
questions := append(unansweredQuestions.Items, noAnswerQuestions.Items...)
writer := csv.NewWriter(csvFile)
writer.Write([]string{"Title", "Link", "Tags", "Answered", "Answer Count", "View Count", "Comment Count", "Creation Date"})

In the above code we make requests to both the unanswered and no answered questions API. Since both responses share the same data model, we are going to combine them at the end.

The API for questions does not include statistics for comments, which is why we needed to have a function for getting comments. However, it doesn’t make sense to include the comments in our CSV. Instead, it makes sense to get a comment count for the spreadsheet.

for _, question := range questions {
	comments, err := GetComments(question.QuestionId)
	if err != nil {
		fmt.Println(err)
		return
	}
	data := []string{question.Title, question.Link, strings.Join(question.Tags, "/"), strconv.FormatBool(question.IsAnswered), strconv.Itoa(question.AnswerCount), strconv.Itoa(question.ViewCount), strconv.Itoa(len(comments.Items)), time.Unix(question.CreationDate, 0).Format(time.RFC3339)}
	writer.Write(data)
}

The above code loops through each of the questions, whether it is unanswered or has no answers. For every question, we take the id and use it to get the comments for that question. The CSV writer for Go expects an array for every line of the CSV, so we create an array using only the information we want.

To seal the file, we can include the following outside of the loop:

writer.Flush()

If everything went well, a CSV should be created when you run the application. If you wish to run the application without first building it, you can execute the following:

go run *.go -output output.csv -tag couchbase

The above command would get all the unanswered questions and questions with no answers for the couchbase tag. Using the CSV that was generated, you can look for all the questions that need attention.

Conclusion

You just saw how to build a Go application that makes use of the Stack Exchange APIs. More specifically you saw how to consume question and comment data from Stack Overflow, parse it, and export it as CSV data.

The code I demonstrated was part of a real application I developed for tracking questions so they can be passed easily to the appropriate people. It is important to take note that you probably won’t be able to run this application more than one time per day due to the rate limits on the Stack Exchange APIs.

Nic Raboy

Nic Raboy

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.