当前,人类每天创造出约1.77亿TB的视频数据,累计时长足以从史前期间一连播放到现在。怎样准确评判视频质量,并分身成本和体验,让有限的带宽和算力真正用在“刀刃”上,成为行业的一大困难。
来源:https://www.infoq.cn/article/gc3oNgtmlcZTr0cAqr9y
喜发新模子,却被众嘲是停业“前兆”!Stability AI “最强”模子人形绘制太“阴间”,网友:因为研发太讲武德
6月12日,Stability AI 推出了 Stable Diffusion 3 Medium,这家英国初创公司称其为“迄今为止最先进的文本到图像开放模子”。
来源:https://www.infoq.cn/article/29AtySiZV6MB129O6Xxe
美图奇想大模子进阶至V5,一口气发布6款新品喊话友商:快来抄作业
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.
来源:http://arxiv.org/abs/2406.08464v1
OLMES: A Standard for Language Model Evaluations
Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to how a model is evaluated on a task can lead to large changes in measured performance. There is no common standard setup, so different models are evaluated on the same tasks in different ways, leading to claims about which models perform best not being reproducible. We propose OLMES, a completely documented, practical, open standard for reproducible LLM evaluations. In developing this standard, we identify and review the varying factors in evaluation practices adopted by the community - such as details of prompt formatting, choice of in-context examples, probability normalizations, and task formulation. In particular, OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions against larger models that can utilize the original formulation. OLMES includes well-considered recommendations guided by results from existing literature as well as new experiments investigating open questions.
来源:http://arxiv.org/abs/2406.08446v1
Dynamic Retrieval Augmented Generation of Ontologies using Artificial
Intelligence (DRAGON-AI) Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources. Results: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues. Conclusions: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.
来源:http://arxiv.org/abs/2312.10904v2
Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster
Preparedness Communication: Extending the CASA Paradigm This study is among the first to develop different prototypes of generative AI (GenAI) chatbots powered by GPT 4 to communicate hurricane preparedness information to diverse residents. Drawing from the Computers Are Social Actors (CASA) paradigm and the literature on disaster vulnerability and cultural tailoring, this study conducted a between-subjects experiment with 441 Black, Hispanic, and Caucasian residents of Florida. A computational analysis of chat logs (N = 7,848) shows that anthropomorphism and personalization are key communication topics in GenAI chatbot-user interactions. SEM results (N = 441) suggest that GenAI chatbots varying in tone formality and cultural tailoring significantly predict bot perceptions and, subsequently, hurricane preparedness outcomes. These results highlight the potential of using GenAI chatbots to improve diverse communities' disaster preparedness.
来源:http://arxiv.org/abs/2406.08411v1
Large Language Models Must Be Taught to Know What They Don't Know
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
来源:http://arxiv.org/abs/2406.08391v1
齐思
B&W Manga Block 项目 是 Hugging Face 上的一个模子,专门用于创建粗线条的肖像插图 。该模子在单色和简单的提示下效果最佳,权重以 Safetensors 格式提供,用户可以从“文件和版本”选项卡中下载权重。对于数字艺术爱好者,尤其是漫画风格插图的爱好者,这个模子可以或许轻松天生特定漫画美学的艺术作品。
Stable Diffusion 3 Medium 是 Stability AI 开辟的一种多模态扩散变压器(MMDiT)文本到图像模子。该模子适用于天生艺术品、设计和其他艺术过程,通过三个高级预练习文本编码器提拔妥当性能。尽管不适用于天生真实人物或变乱的内容,但它在图像质量和复杂提示明白方面体现优异,而且免费向非商业用途开放利用。