AWS AI Practitioner
A company has developed a large language model (LLM) and wants to make the LLM available to multiple internal teams. Which inference mode should be used for a chatbot that needs to process real-time user queries with minimal latency?
A
Batch transform
B
Real-time inference
✓ Correcta
C
Asynchronous inference
D
Serverless inference