OpenAI 《A practical guide to building agents》Introduction Large language models are becoming increasingly capable of handling complex, multi-step tasks. Advances in reasoning, multimodality, and tool use have unlocked a new category of LLM-powered users’ behalf with a high degree of independence. Agents are systems that independently accomplish tasks on your behalf. A workflow is a sequence of steps that must be executed to meet the user’s goal, whether for different tasks in the workflow. Not every task requires the smartest model—a simple retrieval or intent classification task may be handled by a smaller, faster model, while harder tasks like deciding0 码力 | 34 页 | 7.00 MB | 6 月前3
Google 《Prompt Engineering v7》58 Experiment with input formats and writing styles 59 For few-shot prompting with classification tasks, mix up the classes 59 Adapt to model updates 60 Experiment with output formats 60 JSON Repair February 2025 8 These prompts can be used to achieve various kinds of understanding and generation tasks such as text summarization, information extraction, question and answering, text classification, five examples for few-shot prompting. However, you may need to use more examples for more complex tasks, or you may need to use fewer due to the input length limitation of your model. Table 2 shows a few-shot0 码力 | 68 页 | 6.50 MB | 6 月前3
Trends Artificial Intelligence
a week of productivity, and almost by definition, the time savings is coming from less valuable tasks… …We were early movers in AI. But we’re still in the early stages of the journey. - JP Morgan AI-powered restaurant management platform designed to optimize store operations by automating repetitive tasks like inventory tracking, scheduling, and food preparation alerts. It leverages machine learning ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours… …Deep research marks a0 码力 | 340 页 | 12.14 MB | 4 月前3
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelas the number of parameters increases, allowing it to exhibit emergent capabilities across various tasks (Wei et al., 2022). However, the improvement comes at the cost of larger computing resources for training DeepSeek-V2 achieves top-tier performance among open-source models. multi-subject multiple-choice tasks while DeepSeek-V2 is comparable or better on others. Note that for the CHID benchmark, the tokenizer we mainly include generation-based benchmarks, except for several representative multiple-choice tasks (MMLU and ARC). We also conduct an instruction-following evaluation (IFEval) (Zhou et al., 2023) for0 码力 | 52 页 | 1.23 MB | 1 年前3
OpenAI - AI in the Enterpriseoutputs in shorter time frames. 02 Automating routine operations Freeing people from repetitive tasks so they can focus on adding value. 03 Powering products By delivering more relevant and responsive was simple: If advisors could access information faster and reduce the time spent on repetitive tasks, they could offer more and better insights to clients. They started with three model evals: 01 lead to more stable, reliable applications that are resilient to change. Evals are built around tasks that measure the quality of the output of a model against a benchmark—is it more accurate? More0 码力 | 25 页 | 9.48 MB | 5 月前3
共 5 条
- 1













