Data cleaning and anonymizing with GPT-3.5
Note added on 2025-07-28: It’s been two years of programming with and for LLMs. This article sounds so naïve now. Say you made a website in which customers buy personalized gifts. Each gift comes with a message, written by the customer in whatever language the customer wants. The use of grammar, punctuation and capitalizations in the messages is often creative. You would like to be able to offer reasonably normative messages to your customers. You would also like to store a fully anonymized version of the messages; replace all proper names with a [proper_name] placeholder, place names with [place_name], dates with [date], times with [time], and geographical coordinates with [coordinates]. ...