Gemini API Memory Efficient Way to Handle File Upload

Created on Aug 1, 2025  /  Updated on Aug 1, 2025    #go   #efficient   #performance  
Disclaimer: Views expressed in this software engineering blog are personal and do not represent my employer. Readers are encouraged to verify information independently.

Introduction

Hello, in this post we will design a memory efficient way to take a file from client and communicate it back to the genai go SDK.

Method

If we go the documentation of document processing API notice in the example they are getting the file storing it inside a temporary buffer then using client.Files.UploadFromPath to upload the file from local path, genai, at the moment of writing this blog allow maximum is 20 MB Also the attached PDF which is 19.7 MB is the PDF I will be using in this blog.

This is problematic, if we have 100 concurrent user that means at 2000 MB are being used! So to overcome this challenge, there is another API not documented in the link above which is Upload

func (m Files) Upload(ctx context.Context, r io.Reader, config *UploadFileConfig) (*File, error)

It takes io.Reader that’s perfect it means that we can pass any thing that implements this interface, that means that we can instead of copy the file into the buffer we can actually just do the following

respHttp, _ := http.Get(pdfURL)
uploadedFile, _ := client.Files.Upload(ctx, respHttp.Body, uploadConfig)
respHtpp.Body.Close()

Like that instead of fetching all the file into a buffer in memory, we pass the Body which then get streamed into chunks under the hood

func (ac *apiClient) uploadFile(ctx context.Context, r io.Reader,
    uploadURL string, httpOptions *HTTPOptions) (*File, error) {
	var offset int64 = 0
	var resp *http.Response
	var respBody map[string]any
	var uploadCommand = "upload"

	buffer := make([]byte, maxChunkSize)
	for { ... }

We can see that it load the content from the reader as chunks of maxChunkSize which is const maxChunkSize = 8 * 1024 * 1024 // 8 MB chunk size So it’s doing it as 8MB of chunks instead of passing all the file at once! Okay so this example we are getting the file from an external URL, but we said in introduction we need to also take the file from the client.

Passing file from HTTP

To parse the file from HTTP, we usually use FormFile the method returns a multipart.File which implements the io.Reader interface. So we can do the following

file, _, _ := r.FormFile("file")
uploadedFile, err := aiClient.Files.Upload(r.Context(), file, uploadConfig)
file.Close()

I prepared an example that you can download from here After downloading the example, grab genai key from google and put it as env variable GEMINI_API_KEY and run make server to run the server and make upload to upload the file to the server.

λ ~/code/playground/demo-stream-file/ main* make server
go run .
2025/08/01 15:18:00 Initializing AI file processor server...
2025/08/01 15:18:00 Creating Gemini AI client...
2025/08/01 15:18:00 AI client created successfully
2025/08/01 15:18:00 Starting server on localhost:8080
2025/08/01 15:18:00 Ready to accept file upload requests at /ai endpoint

Then in another console I do make upload

λ ~/code/playground/demo-stream-file/ main* make upload
sh upload.sh
Waiting for server at http://localhost:8080/healthz...
Server is up. Starting file upload...
Found A17_FlightPlan.pdf
Uploading A17_FlightPlan.pdf and waiting AI response

We can see the following output, the memory usage peaked at 41.33 MB

1.webp

We can deep dive into more advanced profiling ( which I did, but didn’t include in demo for simplicity ) that the source is coming from FormFile under the hood it calls

err := r.ParseMultipartForm(defaultMaxMemory)

And the default max memory is 32 MB

const defaultMaxMemory untyped int = 32 << 20 // 33554432

And remember our file is 20MB maximum so a rule of thumb is to put this value to 1 or 2 MB so let’s do that

r.ParseMultipartForm(1 << 20)

This will configure it to read as 1 MB chunks instead of 32 MB

2.webp

Now the highest memory usage went down from 41.33 MB to 9.33 MB that’s approximately 77.4% memory savings per request!

Kanna Kamui

Some people may say that hey, the gemini output size is variable, yes, but it's just few bytes diff, so nearly negligible. Also some people may say, hey that's only one run, yes you are right, I run it couple of times, and it's nearly same numbers.

Parsing file from gRPC

in gRPC it’s bit tricky, you can get all the file as bytes and pipe it into the Upload method, that works, if you need to transfer the file from the client without using bytes you can use a gRPC stream, mainly you stream the file into io.Pipe , and you pass the reader to the Upload method.