Evaluating and Debugging Generative AI Models Using Weights and Biases