Preprocessing Unstructured Data for LLM Applications