The integration of large language models (LLMs) into mobile applications has opened new horizons in natural language processing tasks. However, developers face the critical choice between online (cloud-based) and offline (on-device) inference methods. This paper explores the technical considerations, advantages, and disadvantages of both approaches. We analyze the impact on performance, privacy, resource utilization, and user experience, and discuss hybrid methods that aim to combine the strengths of both online and offline inference. A comparative analysis is presented in the form of a table summarizing the key factors. Our findings provide insights for developers to make informed decisions when integrating LLMs into mobile applications.
@artical{a13122024ijsea13121002,
Title = "Online vs. Offline LLM Inference: Unlocking the Best of Both Worlds in Mobile Applications",
Journal ="International Journal of Science and Engineering Applications (IJSEA)",
Volume = "13",
Issue ="12",
Pages ="5 - 8",
Year = "2024",
Authors ="Anton Novikau"}