With the constant evolution of AI models, we're witnessing an incredible improvement in their ability to extract factual information from sources. Thanks to the magic of natural language processing techniques, models can now process vast amounts of data and produce accurate outputs, but there's always the risk of models making up information or producing incomplete and inaccurate results.
So how do we ensure that models always give it to us accurately? Models process information in two ways - through "weights" and "context." Weights can store a lot of information but can lead to inaccuracies, while context provides real-time information and leads to more accurate outcomes. However, the current length of context is limited, making it difficult to store or extract reliable information from lengthy documents (for instance, remembering what happened in a conversation).
But don't worry, researchers are already working hard to increase context length and combine it with search to improve the accuracy and reliability of models. As a result, we're expecting to see a significant leap in truthfulness, and the gap between machines and humans will continue to shrink. And as models become more factual, they'll be able to automate more tasks, making our lives more convenient and efficient.
What if machines could understand visual data as effortlessly as humans? Have you considered that many tasks can't be represented as plain text, such as navigating user interfaces or interpreting formatted documents with images? These tasks have been difficult to automate due to the challenge of machine comprehension of visual data, but a breakthrough has emerged.
Multimodal models are making this breakthrough possible by allowing machines to understand photos and videos. With these models, tasks that were previously impossible to automate, such as document processing and UI automation, can now be done with remarkable efficiency and accuracy.
Early efforts in this field, like LayoutLM and Salesforce's BLIP-2 models, are already demonstrating this potential, and as the models develop and refine, we can expect to see even more tasks automated. The future of human-machine collaboration is already here, and it will enable us to achieve incredible automatable opportunities in fields such as healthcare, finance, and beyond.
In the near future, language models (LMs) are poised to become powerful agents that can perform self-driven tasks without explicit instructions. With the ability to understand information whether in text or within images/documentation, the field of LMs is on the brink of a transformative breakthrough that will revolutionize the way we work.
Current LMs have limitations when it comes to working seamlessly with external tools, but ongoing projects such as Toolformer, Langchain, and Adept.ai are tackling this challenge. As a result, LMs will soon be able to automate complex tasks with ease and work in conjunction with external tools to achieve unprecedented efficiency.
Imagine a scenario where you simply tell your LM to "book a trip to London for the conference next weekend, but ensure the departure flight isn't a red-eye," and the LM agent handles the rest. This kind of automated task will soon become a reality, streamlining workflows and making processes such as fixing billing issues with customer accounts much easier.
This breakthrough will undoubtedly shape the future of LMs, marking the final step in their development and ushering in a new era of autonomous technology. With the ability to automate tasks and work in conjunction with external tools, LMs will be at the forefront of the new wave of technology that will change the way we live and work.
As models continue to improve in the next few years, we'll see a significant increase in the automation of procedural work. This will be driven by better reliability, the ability to reason from context, and more advanced capabilities. Even without significant improvements in actual intelligence, automation will become increasingly prevalent in computer-based tasks where there is a lot of unsupervised data.
Although it's harder for physical tasks in the real world, it's just a matter of time before most people's jobs will turn into supervisors, monitoring models, and judging output. This shift towards automation means that job roles will require a more managerial approach rather than a hands-on approach. It's a bold prediction, but one that's worth taking seriously.
If you're interested in learning more about how automation can benefit your company and how you can manage the shift toward automation, reach out! We’re happy to share our thoughts on where we see it heading and help you prepare.