Sam Altman: Size of LLMs won’t matter as much moving forward If an application requires minimal latency, we need to apply more chips and divide the model into as many parts as possible. Smaller batch sizes usually achieve lower latency, but smaller batch sizes also result in poorer utilization, leading to higher overall cost per...Read More
Recent Comments