Designing Flexible Spreadsheet Uploads with AI

All Articles AI Culture Data Management Level 12 News Python Salesforce Software Development Testing

For a long time, we've had a common paradigm for our views that expect spreadsheet/CSV file uploads.

The view template provides a definitive list of the expected columns. Each column has a specific header expected to be present in the sheet and a description of what that column should contain. A downloadable template file gives your users somewhere to start formatting data appropriately for upload.

More technical users will understand why that structure is important. Computers are dumb - they don't understand files unless given specific instructions on how to process and analyze them.

Less technical users will find friction because we've just introduced more work. Why can't the computer handle any upload I give it? Sure, one or two details are a little bit different, but it has the information needed.

Why not leverage what large language models (LLMs) do well to make our upload expectations more flexible?

While LLMs aren’t suited for analyzing data, they can be useful in providing structured data outputs and help streamline user experience.

Can LLMs Improve File Uploads?

Here's the case I was working with: a client wanted external users to be able to upload a spreadsheet containing addresses. Regardless of other information in the sheet, we wanted to pin those addresses on a map.

If you’re a developer, your mind is filled immediately with all the ways this can break. What are the address columns named? Does the sheet have one address column or more than one column? If more than one, what order are the columns in?

Curious whether the premise would work, I set about testing it in the most direct way possible: upload the file to ChatGPT and prompt the LLM to provide a structured address output in response.

And what did I find out? Well, not to put too fine a point on it, but ChatGPT is terrible at analyzing files, no matter how specific my prompt became.

Sometimes I got a perfectly valid response, but it wasn't repeatable. I received the expected output in one out of every four attempts at best.

Now, if I'm doing a manual prompt in the chat interface and then evaluating results, this can be encouraging - I can have some reason to retry prompts that didn't work the first time.

From an application automation perspective, though, one out of four is a terrible rate. How would I find signal out of the noise and know which response was correct? What if the response only included some records but not all?

It's not hard to imagine why the LLM doesn't work well with this use case. The LLM is a language model that generates predictive text. It doesn't know how to read files. It does have coding tools it can use, though, if it generates the code.

So ChatGPT writes some code to open and read the file, to varying degrees of success. Then with that layer of indirection, it applies my original prompt to try to mine the data for specific information.

A New Approach: Read Data Before Prompting

Admitting defeat on that tack, I tried a different one. I can't generalize the supported file types too broadly. But if I'm limited to XLSX and CSV, I have the tools to read those in my wheelhouse.

Instead, maybe we can read the file’s data ourselves, then drop the CSV directly into the chat completion prompt. This removes the indirection, takes control of the file handling, and allows the LLM to do what it does best: natural language processing.

Of course, we have context limits to deal with. Our code can't just embed an arbitrarily-sized file's contents into a single prompt. We need to go one step further and chunk the data into batches that will fit in the window.

What did I learn from this tactic? This method achieves a much higher success rate - close to 100%. I found this to be true even if the columns were scrambled.

I thought perhaps the column headers were giving the LLM better context, so I tried a run without any column headings at all. Surprise, surprise - it still gave me valid results.

Enhancing UX with AI-Assisted Uploads

These tactics will only apply to a limited set of use cases. Let's not start uploading a month's worth of financial information this way and expect it to work for complex analysis. But where the opportunity exists, we can leverage AI tools to make the UX smoother.

Originally published on 2025-03-12 by Matt Lewellyn

Reach out to us to discuss your complex deployment needs (or to chat about Star Trek)