Faking Data with Go
Very often I need to benchmark data store operations for customers. While standard benchmarks like TPCC for transactional workloads and TPCH for analytical workloads are useful, writing custom benchmarks in Go is fairly simple. To accomplish this, however, we need to generate some sample data with GoFakeIt.
GoFakeIt is both incredibly performant and has a wide range of data types from credit card number, to email addresses, and even fake beer information.
While you could generate data on the fly as part of the benchmark, I often want to run specific queries, so it is often useful to generate the data and store it as JSON as an intermediate step.
Let’s get started and install the Go library
$ go get github.com/brianvoe/gofakeit/v6
Set up the package and the imports
package main
import (
"encoding/json"
"os"
"github.com/brianvoe/gofakeit/v6"
)
We can create a struct to hold our Employee Record.
The tag fake:"{uuid}"
will call the UUID function and the tag json:"uuid"
will set the JSON key to uuid
.
Digital Ocean has an excellent tutorial on using tags in Go.
type Record struct {
Uuid string `fake:"{uuid}" json:"uuid"`
UserId string `fake:"{username}" json:"username"`
FirstName string `fake:"{firstname}" json:"firstname"`
LastName string `fake:"{lastname}" json:"lastname"`
Company string `fake:"{company}" json:"company"`
Title string `fake:"{jobtitle}" json:"title"`
Email string `fake:"{email}" json:"email"`
Cell string `fake:"{phoneformatted}" json:"cellphone"`
Years int `fake:"{number:1,10}" json:"tenure"`
}
func main() {
// Create a variable of the struct type
var r Record
// open the outfile
f, err := os.Create("data.json")
if err != nil {
panic(err)
}
// Generate 1M records and append them to the file
for i := 1; i <= 1000000; i++ {
gofakeit.Struct(&r)
data, _ := json.Marshal(r)
_, err := f.WriteString(string(data) + "\n")
if err != nil {
panic(err)
}
}
// Grownups clean up after themselves
err = f.Close()
if err != nil {
panic(err)
}
}
We can generate 1 million Employee Records in under 20 seconds
$ time go run fakeupjson.go
real 0m19.031s
user 0m18.118s
sys 0m1.661s
$ wc -l data.json
1000000 data.json
And the data is nicely formatted in JSON
head -2 data.json |jq
{
"uuid": "3efcc7e4-a667-4a99-8173-bca5a66e9e6e",
"username": "Mayer7753",
"firstname": "Emmett",
"lastname": "Wehner",
"company": "TuvaLabs",
"title": "Orchestrator",
"email": "isomkris@leannon.io",
"cellphone": "854-722-2761",
"tenure": 10
}
{
"uuid": "b65a7a73-fb1e-4a47-8fcf-a110480b3b69",
"username": "Hartmann9159",
"firstname": "Aniyah",
"lastname": "Mills",
"company": "Golden Helix",
"title": "Producer",
"email": "orrinlittle@nikolaus.info",
"cellphone": "238-402-9777",
"tenure": 5
}
Now you have a million records to load into your data store as part of your benchmark
The full code is available as a Gist.
Happy Bench-marking!