[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--大模型、扩散模型、视觉语言导航

专属领域论文订阅

VX关注{晓理紫}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

为了答谢各位网友的支持，从今日起免费为300名读者提供订阅主题论文服务，只需VX关注公号并回复{邮箱+论文主题}（如：[email protected] + chatgpt@large language model @LLM）,主题必须是同一个领域，最多三个关键词。解释权归博主所有

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉语言导航VLN

强化学习 RL

模仿学习 IL

机器人

开放词汇，检测分割

== LLM ==

标题: Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

作者: Weizhou Shen, Chenliang Li, Hongzhan Chen

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.07324v2

GitHub: https://github.com/X-PLUG/Multi-LLM-Agent|

中文摘要: 大型语言模型（LLM）代理显著扩展了独立LLM的功能，使它们能够与外部工具（例如，API、函数）进行交互，并以自我指导的方式完成复杂的任务。工具使用的挑战要求LLMs不仅理解用户查询并生成答案，而且在任务规划、内存管理、工具调用和结果汇总方面表现出色。虽然传统的方法集中于训练具有所有这些能力的单个LLM，但是性能限制变得明显，特别是对于较小的模型。此外，当工具更新时，整个LLM可能需要再培训。为了克服这些挑战，我们提出了一种新颖的策略，将上述功能分解为计划器、调用器和摘要器。每个组件都由一个LLM实现，该LLM专注于特定的功能，并与其他组件协作来完成任务。这种模块化框架有助于单独更新，并可能使用较小的LLM来构建每种能力。为了有效地训练这个框架，我们引入了一个两阶段的训练范式。首先，我们在整个数据集上微调主干LLM，而不区分子任务，为模型提供对任务的全面理解。其次，微调的LLM分别用于实例化计划器、调用器和摘要器，它们在各自的子任务上不断地被微调。跨各种工具使用基准的评估表明，我们提出的多LLM框架超越了传统的单LLM方法，突出了其在工具学习方面的功效和优势。

摘要: Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete complex tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers but also excel in task planning, memory management, tool invocation, and result summarization. While traditional approaches focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. Moreover, the entire LLM may require retraining when tools are updated. To overcome these challenges, we propose a novel strategy that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with other components to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.

标题: Meta Prompting for AGI Systems

作者: Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2311.11482v4

GitHub: https://github.com/meta-prompting/meta-prompting|

中文摘要: 本文介绍了对元提示的全面研究，这是一种创新技术，重塑了大型语言模型（LLMs）、多模态基础模型和人工智能系统在问题解决和数据交互中的应用。基于类型理论和范畴理论，元提示强调信息的结构和语法，而不是传统的以内容为中心的方法。本文探讨了元提示（MP）的正式定义，将其与少数镜头提示区分开来，并强调了其在各种人工智能应用中的有效性。一个关键的焦点是将元提示应用于复杂推理（MP-CR）任务，展示它如何有效地将复杂的问题解构为更简单的子问题，提高令牌效率，并实现更公平的问题解决比较，特别是针对少数提示方法。此外，本文还介绍了提示任务的元提示，允许LLMs以递归的、类似元编程的方式自行生成新的提示。这种方法标志着人工智能自主和适应能力的重大飞跃。本文还介绍了元提示到多模态基础模型设置中的集成，解决了在结构化元提示框架中整合各种数据类型（如图像、音频和视频）的挑战和机遇。实证实验，包括以100%的成功率解决24个任务的游戏，展示了MP-CR代理增强的推理能力，实现了高准确性和高效率，并展示了元提示对人工智能问题解决的变革性影响。（代码见https://github.com/meta-prompting/meta-prompting）

摘要: This paper presents a comprehensive study of Meta Prompting, an innovative technique reshaping the utilization of large language models (LLMs), multi-modal foundation models, and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting (MP), sets it apart from Few-Shot Prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning (MP-CR) tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. This approach marks a significant leap in AI’s autonomous and adaptive capabilities. The paper also introduces the integration of Meta Prompting into multi-modal foundation model settings, tackling the challenges and opportunities of incorporating varied data types such as images, audio, and video within the structured Meta Prompting framework. Empirical experiments, including solving the Game of 24 tasks with 100% success rate, demonstrate the MP-CR Agent’s enhanced reasoning capabilities, achieving high accuracy and efficiency, and showcasing Meta Prompting’s transformative impact on AI problem-solving. (The code is available at https://github.com/meta-prompting/meta-prompting)

标题: Augmenting Math Word Problems via Iterative Question Composing

作者: Haoxiong Liu, Yifan Zhang, Yifan Luo

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.09003v3

Project: https://huggingface.co/datasets/Vivacem/MMIQC|

GitHub: https://github.com/iiis-ai/IterativeQuestionComposing|

中文摘要: 尽管用于数学推理的大型语言模型（LLM）取得了进步，但解决竞争级别的数学问题仍然是一个重大挑战，尤其是对于没有外部工具的开源LLM。我们引入了MMIQC数据集，它由处理过的web数据和合成的问题——回答对组成，旨在增强基础语言模型的数学推理能力。在MMIQC上微调的模型在各种模型尺寸的数学基准测试中的性能始终超过其同行。值得注意的是，Qwen-72B-MMIQC实现了45.0%的准确率，比之前的开源技术水平高出8.2%，并超过了2023年发布的初始版本GPT-4。对匈牙利高中期末考试的广泛评估结果表明，这种改善可以推广到看不见的数据。我们对MMIQC的消融研究表明，很大一部分改进可以归因于我们的新增强方法，迭代问题合成（IQC），它涉及使用LLM从种子问题迭代合成新问题，并通过另一个LLM应用拒绝采样。MMIQC数据集可在https://huggingface.co/datasets/Vivacem/MMIQC。的HuggingFace中心获得我们的代码可以在https：//github.com/iiis-ai/iterative question composing找到。

摘要: Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM. The MMIQC dataset is available on the HuggingFace hub at https://huggingface.co/datasets/Vivacem/MMIQC. Our code is available at https://github.com/iiis-ai/IterativeQuestionComposing.

标题: Engineering A Large Language Model From Scratch

作者: Abiodun Finbarrs Oketunji

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.16736v2

中文摘要: 自然语言处理（NLP）中深度学习的扩散导致了能够以非凡的熟练程度理解和生成人类语言的创新技术的开发和发布。Atinuke是一种基于Transformer model的神经网络，通过利用独特的配置来优化各种语言任务的性能。该体系结构将处理顺序数据的层与注意机制交织在一起，以在输入和输出之间绘制有意义的关联。由于其拓扑结构和超参数调整，它可以通过提取特征和学习复杂映射来模拟类人语言。Atinuke是模块化的，可扩展的，并与现有的机器学习管道无缝集成。softmax、嵌入和多头注意力等高级矩阵运算能够细致入微地处理文本、声音和视觉信号。通过将现代深度学习技术与软件设计原则和数学理论相结合，该系统在自然语言任务上实现了最先进的结果，同时保持了可解释性和鲁棒性。

摘要: The proliferation of deep learning in natural language processing (NLP) has led to the development and release of innovative technologies capable of understanding and generating human language with remarkable proficiency. Atinuke, a Transformer-based neural network, optimises performance across various language tasks by utilising a unique configuration. The architecture interweaves layers for processing sequential data with attention mechanisms to draw meaningful affinities between inputs and outputs. Due to the configuration of its topology and hyperparameter tuning, it can emulate human-like language by extracting features and learning complex mappings. Atinuke is modular, extensible, and integrates seamlessly with existing machine learning pipelines. Advanced matrix operations like softmax, embeddings, and multi-head attention enable nuanced handling of textual, acoustic, and visual signals. By unifying modern deep learning techniques with software design principles and mathematical theory, the system achieves state-of-the-art results on natural language tasks whilst remaining interpretable and robust.

标题: Hierarchical Continual Reinforcement Learning via Large Language Model

作者: Chaofan Pan, Xin Yang, Hao Wang

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.15098v2

中文摘要: 在动态环境中持续学习的能力是强化学习（RL）代理在现实世界中应用的关键要求。尽管在持续强化学习（CRL）方面取得了进展，但现有的方法经常受到知识转移不足的影响，特别是当任务多样化时。为了应对这一挑战，我们提出了一个新的框架，通过大型语言模型的分层连续强化学习（Hi-Core），旨在促进高级知识的转移。Hi-Core编排了一个双层结构：通过大型语言模型（LLM）的高级策略制定，它代表一系列目标，以及与面向目标的RL实践紧密结合的低级策略学习，产生代理响应所设定的目标的行动。该框架采用反馈来迭代调整和验证高级策略，将它们与低级策略一起存储在技能库中。当遇到新任务时，Hi-Core从该库中检索相关经验以帮助学习。通过在Minigrid上的实验，Hi-Core已经证明了它在处理各种CRL任务方面的有效性，其性能优于流行的基线。

摘要: The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via large language model (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM), which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent’s actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.

标题: Object-Centric Instruction Augmentation for Robotic Manipulation

作者: Junjie Wen, Yichen Zhu, Minjie Zhu

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.02814v2

摘要: Humans interpret scenes by recognizing both the identities and positions of objects in their observations. For a robot to perform tasks such as \enquote{pick and place}, understanding both what the objects are and where they are located is crucial. While the former has been extensively discussed in the literature that uses the large language model to enrich the text descriptions, the latter remains underexplored. In this work, we introduce the \textit{Object-Centric Instruction Augmentation (OCI)} framework to augment highly semantic and information-dense language instruction with position cues. We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction, thus aiding the policy network in mastering actions for versatile manipulation. Additionally, we present a feature reuse mechanism to integrate the vision-language features from off-the-shelf pre-trained MLLM into policy networks. Through a series of simulated and real-world robotic tasks, we demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.

== VLM ==

标题: Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

作者: Minjie Zhu, Yichen Zhu, Jinming Li

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.04181v2

Project: https://jlm-z.github.io/RSFT/|

中文摘要: 语言条件机器人操作旨在将自然语言指令转换为可执行的动作，从简单的拾取和放置到需要意图识别和视觉推理的任务。受认知科学中的双重过程理论的启发，该理论提出了人类决策中快速和慢速思维的两个平行系统，我们引入了快速和慢速思维机器人（RFST），这是一个模仿人类认知架构对任务进行分类并根据指令类型在两个系统上做出决策的框架。我们的RFST由两个关键组件组成：1）根据当前用户指令确定应该激活哪个系统的指令鉴别器，以及2）由与策略网络一致的微调视觉语言模型组成的慢速思考系统，该系统允许机器人识别用户意图或执行推理任务。为了评估我们的方法，我们建立了一个以真实世界轨迹为特色的数据集，捕捉从自发冲动到需要深思熟虑的任务的各种行为。我们在模拟和真实世界场景中的结果证实，我们的方法能够熟练地管理需要意图识别和推理的复杂任务。该项目可在https：//jlm-z.github.io/RSFT/

摘要: The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning. Inspired by the dual process theory in cognitive science, which suggests two parallel systems of fast and slow thinking in human decision-making, we introduce Robotics with Fast and Slow Thinking (RFST), a framework that mimics human cognitive architecture to classify tasks and makes decisions on two systems based on instruction types. Our RFST consists of two key components: 1) an instruction discriminator to determine which system should be activated based on the current user instruction, and 2) a slow-thinking system that is comprised of a fine-tuned vision language model aligned with the policy networks, which allows the robot to recognize user intention or perform reasoning tasks. To assess our methodology, we built a dataset featuring real-world trajectories, capturing actions ranging from spontaneous impulses to tasks requiring deliberate contemplation. Our results, both in simulation and real-world scenarios, confirm that our approach adeptly manages intricate tasks that demand intent recognition and reasoning. The project is available at https://jlm-z.github.io/RSFT/

标题: MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

作者: Yu-Shan Tai, An-Yeu, Wu

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.14895v2

摘要: While vision transformers (ViTs) have shown great potential in computer vision tasks, their intense computation and memory requirements pose challenges for practical applications. Existing post-training quantization methods leverage value redistribution or specialized quantizers to address the non-normal distribution in ViTs. However, without considering the asymmetry in activations and relying on hand-crafted settings, these methods often struggle to maintain performance under low-bit quantization. To overcome these challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the asymmetry issue and reduce the clamping loss. We also introduce optimal scaling factor ratio search (OPT-m) to determine quantization parameters by a data-dependent mechanism automatically. To further enhance the compressibility, we incorporate the above-mentioned techniques and propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT). We develop greedy mixed-precision quantization (Greedy MP) to allocate layer-wise bit-width considering both model performance and compressibility. Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset. Specifically, our proposed methods achieve accuracy improvements ranging from 0.90% to 23.35% on 4-bit ViTs with single-precision and from 3.82% to 78.14% on 5-bit fully quantized ViTs with mixed-precision.

标题: What Do Self-Supervised Speech Models Know About Words?

作者: Ankita Pasad, Chung-Ming Chien, Shane Settle

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2307.00162v3

中文摘要: 在过去几年中引入了许多自我监督语音模型（S3Ms），提高了各种语音任务的性能和数据效率。然而，仅仅这些经验上的成功并不能完整地描述在预培训中学到了什么。最近的工作已经开始分析S3Ms如何编码某些属性，如语音和说话者信息，但我们仍然缺乏对单词级别和其他级别编码的知识的正确理解。在这项工作中，我们使用轻量级分析方法来研究S3Ms中编码的片段级语言属性——单词身份、边界、发音、句法特征和语义特征。我们对来自10个S3Ms的分层表示进行了比较研究，发现（i）每个词段内的帧级表示并不都具有相同的信息量，以及（ii）预训练目标和模型大小严重影响跨层语言信息的可访问性和分布。我们还发现，在几项任务上——单词辨别、单词分割和语义句子相似性——用视觉基础训练的S3Ms优于纯语音的S3Ms。最后，我们基于任务的分析证明了在使用比先前工作更简单的方法的同时，在分词和声学单词辨别方面的改进性能。

摘要: Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information, but we still lack a proper understanding of knowledge encoded at the word level and beyond. In this work, we use lightweight analysis methods to study segment-level linguistic properties – word identity, boundaries, pronunciation, syntactic features, and semantic features – encoded in S3Ms. We present a comparative study of layer-wise representations from ten S3Ms and find that (i) the frame-level representations within each word segment are not all equally informative, and (ii) the pre-training objective and model size heavily influence the accessibility and distribution of linguistic information across layers. We also find that on several tasks – word discrimination, word segmentation, and semantic sentence similarity – S3Ms trained with visual grounding outperform their speech-only counterparts. Finally, our task-based analyses demonstrate improved performance on word segmentation and acoustic word discrimination while using simpler methods than prior work.

标题: Towards Few-shot Out-of-Distribution Detection

作者: Jiuqing Dong, Yongbin Gao, Heng Zhou

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2311.12076v3

中文摘要: 分布外（OOD）检测对于确保开放世界智能系统的可靠性至关重要。尽管现有的OOD检测方法取得了显著的进步，但我们的研究发现，在训练样本稀缺的情况下，性能会显著下降。在这种情况下，我们介绍了一种新的少数镜头OOD检测基准，精心构建以解决这一差距。我们的实证分析揭示了参数高效微调（PEFT）策略的优越性，如视觉提示调谐和视觉适配器调谐，优于传统技术，包括在少数镜头OOD检测任务中的完全微调和线性探测调谐。认识到来自预训练模型的一些关键信息，这些信息对于OOD检测是关键的，可能在微调过程中丢失，我们提出了一种称为领域特异性和一般知识融合（DSGF）的方法。这种方法旨在与不同的微调框架兼容。我们的实验表明，DSGF的集成显著增强了跨各种方法和微调方法的少量OOD检测能力，包括完全微调、可视化适配器调整和可视化提示调整。代码将被释放。

摘要: Out-of-distribution (OOD) detection is critical for ensuring the reliability of open-world intelligent systems. Despite the notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our empirical analysis reveals the superiority of ParameterEfficient Fine-Tuning (PEFT) strategies, such as visual prompt tuning and visual adapter tuning, over conventional techniques, including fully fine-tuning and linear probing tuning in the few-shot OOD detection task. Recognizing some crucial information from the pre-trained model, which is pivotal for OOD detection, may be lost during the fine-tuning process, we propose a method termed DomainSpecific and General Knowledge Fusion (DSGF). This approach is designed to be compatible with diverse fine-tuning frameworks. Our experiments show that the integration of DSGF significantly enhances the few-shot OOD detection capabilities across various methods and fine-tuning methodologies, including fully fine-tuning, visual adapter tuning, and visual prompt tuning. The code will be released.

标题: Image Translation as Diffusion Visual Programmers

作者: Cheng Han, James C. Liang, Qifan Wang

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.09742v2

中文摘要: 我们介绍了新颖的扩散视觉编程器（DVP），一个神经符号图像翻译框架。我们提出的DVP在GPT架构中无缝嵌入了条件灵活的扩散模型，为各种亲符号步骤编排了一系列连贯的视觉程序（即计算机视觉模型），这些步骤跨越了RoI识别、风格转移和位置操纵，促进了透明和可控的图像翻译过程。大量的实验证明了DVP的卓越性能，超越了并行艺术。这一成功可以归因于DVP的几个关键特征：首先，DVP通过实例规范化实现了条件灵活的翻译，使模型能够消除由手动指导引起的敏感性，并优化地专注于文本描述，以生成高质量的内容。第二，该框架通过将特征空间中复杂的高维概念破译为更容易访问的低维符号（例如，【提示】、【RoI对象】）来增强上下文推理，允许本地化的、上下文无关的编辑，同时保持整体一致性。最后但同样重要的是，DVP通过在每个编程阶段提供明确的符号表示，使用户能够直观地解释和修改结果，从而提高了系统的可控性和可解释性。我们的研究标志着向协调人工图像翻译过程与认知智能迈出了实质性的一步，有望获得更广泛的应用。

摘要: We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facilitating transparent and controllable image translation processes. Extensive experiments demonstrate DVP’s remarkable performance, surpassing concurrent arts. This success can be attributed to several key features of DVP: First, DVP achieves condition-flexible translation via instance normalization, enabling the model to eliminate sensitivity caused by the manual guidance and optimally focus on textual descriptions for high-quality content generation. Second, the framework enhances in-context reasoning by deciphering intricate high-dimensional concepts in feature spaces into more accessible low-dimensional symbols (e.g., [Prompt], [RoI object]), allowing for localized, context-free editing while maintaining overall coherence. Last but not least, DVP improves systemic controllability and explainability by offering explicit symbolic representations at each programming stage, empowering users to intuitively interpret and modify results. Our research marks a substantial step towards harmonizing artificial image translation processes with cognitive intelligence, promising broader applications.

标题: SAM-based instance segmentation models for the automation of structural damage detection

作者: Zehao Ye, Lucy Lovell, Asaad Faramarzi

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.15266v2

中文摘要: 基于土木结构外观捕获缺陷的自动化视觉检测是至关重要的，因为其目前具有劳动密集型和耗时性。自动检测的一个重要方面是图像采集，考虑到近年来软件和硬件计算的普遍发展，图像采集是快速和经济的。以前的研究主要集中在混凝土和沥青上，很少关注砖石裂缝。后者也缺乏公开可用的数据集。在本文中，我们首先提出了一个相应的数据集，例如具有1,300个注释图像（640像素x 640像素）的分割，命名为MCrack1300，涵盖砖块、碎砖和裂缝。然后，我们测试了几个领先的基准测试算法，包括最新的大规模模型，基于提示的分段任何模型（SAM）。我们使用低秩自适应（LoRA）对编码器进行微调，并提出了两种自动化SAM执行的新方法。第一种方法涉及放弃提示编码器并将SAM编码器连接到其他解码器，而第二种方法引入了可学习的自生成提示器。为了保证这两种方法与SAM编码器部分的无缝集成，我们重新设计了特征提取器。两种提出的方法都超过了最先进的性能，所有类别都超过了最佳基准约3%，特别是裂缝约超过了6%。在成功检测的基础上，我们提出了一种基于单目摄像机和霍夫线变换的方法来自动将图像转换成正交投影图。通过结合砖单元的已知真实尺寸，我们精确地估计裂缝尺寸，结果与激光扫描获得的结果相差不到10%。总的来说，我们解决了自动化砌体裂缝检测和尺寸估计方面的重要研究空白。

摘要: Automating visual inspection for capturing defects based on civil structures appearance is crucial due to its currently labour-intensive and time-consuming nature. An important aspect of automated inspection is image acquisition, which is rapid and cost-effective considering the pervasive developments in both software and hardware computing in recent years. Previous studies largely focused on concrete and asphalt, with less attention to masonry cracks. The latter also lacks publicly available datasets. In this paper, we first present a corresponding data set for instance segmentation with 1,300 annotated images (640 pixels x 640 pixels), named as MCrack1300, covering bricks, broken bricks, and cracks. We then test several leading algorithms for benchmarking, including the latest large-scale model, the prompt-based Segment Anything Model (SAM). We fine-tune the encoder using Low-Rank Adaptation (LoRA) and proposed two novel methods for automation of SAM execution. The first method involves abandoning the prompt encoder and connecting the SAM encoder to other decoders, while the second method introduces a learnable self-generating prompter. In order to ensure the seamless integration of the two proposed methods with SAM encoder section, we redesign the feature extractor. Both proposed methods exceed state-of-the-art performance, surpassing the best benchmark by approximately 3% for all classes and around 6% for cracks specifically. Based on successful detection, we propose a method based on a monocular camera and the Hough Line Transform to automatically transform images into orthographic projection maps. By incorporating known real sizes of brick units, we accurately estimate crack dimensions, with the results differing by less than 10% from those obtained by laser scanning. Overall, we address important research gaps in automated masonry crack detection and size estimation.

== diffusion model ==

标题: BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

作者: Zhennan Wu, Yang Li, Han Yan

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17053v2

Project: https://www.youtube.com/watch?v=PxIBtd6G0mA|

中文摘要: 我们展示了BlockFusion，这是一个基于扩散的模型，它将3D场景生成为单元块，并无缝地合并新块来扩展场景。使用从完整的3D场景网格中随机裁剪的3D块数据集来训练块融合。通过逐块拟合，所有训练块被转换成混合神经场：具有包含几何特征的三平面，随后是用于解码符号距离值的多层感知器（MLP）。采用变分自动编码器将三平面压缩到潜在的三平面空间中，并在该空间上进行去噪扩散过程。应用于潜在表示的扩散允许高质量和多样化的3D场景生成。为了在生成过程中扩展场景，只需要附加空块以与当前场景重叠，并外推现有的潜在三平面以填充新块。外推是通过在去噪迭代期间用来自重叠三平面的特征样本调节生成过程来完成的。潜在的三平面外推产生语义和几何上有意义的过渡，与现有场景和谐融合。2D布局调节机制用于控制场景元素的放置和排列。实验结果表明，BlockFusion能够在室内和室外场景中生成具有前所未有的高质量形状的多样、几何一致和无界的大型3D场景。

摘要: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.

标题: Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

作者: Qingcheng Zhao, Pengyu Long, Qixuan Zhang

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.15687v2

Project: https://sites.google.com/view/media2face|

中文摘要: 从语音合成3D面部动画已经获得了相当大的关注。由于缺乏高质量的4D面部数据和注释良好的丰富多模态标签，以前的方法经常遭受有限的真实性和缺乏词汇条件。我们通过三部曲来应对这一挑战。我们首先介绍了广义神经参数面部资产（GNPFA），这是一种有效的变分自动编码器，将面部几何和图像映射到高度广义的表情潜在空间，解耦表情和身份。然后，我们利用GNPFA从大量视频中提取高质量的表情和准确的头部姿势。这展示了M2F-D数据集，这是一个大型、多样化和扫描级的协同语音3D面部动画数据集，具有良好注释的情感和风格标签。最后，我们提出了Media2Face，这是一个用于协同语音面部动画生成的GNPFA潜在空间中的扩散模型，接受来自音频、文本和图像的丰富的多模态指导。大量实验表明，该模型不仅实现了人脸动画合成的高保真度，而且拓宽了三维人脸动画的表达范围和风格适应性。

摘要: The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an efficient variational auto-encoder mapping facial geometry and images to a highly generalized expression latent space, decoupling expressions and identities. Then, we utilize GNPFA to extract high-quality expressions and accurate head poses from a large array of videos. This presents the M2F-D dataset, a large, diverse, and scan-level co-speech 3D facial animation dataset with well-annotated emotional and style labels. Finally, we propose Media2Face, a diffusion model in GNPFA latent space for co-speech facial animation generation, accepting rich multi-modality guidances from audio, text, and image. Extensive experiments demonstrate that our model not only achieves high fidelity in facial animation synthesis but also broadens the scope of expressiveness and style adaptability in 3D facial animation.

标题: Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

作者: Weiguo Lu, Xuan Wu, Deng Ding

PubTime: 2024-02-01

Downlink: http://arxiv.org/abs/2401.11261v2

中文摘要: 扩散模型（DMs）是一种生成模型，对图像合成和其他方面有着巨大的影响。它们在各种生成任务中实现最先进的生成结果。各种各样的条件输入，如文本或边界框，都可以用来控制生成。在这项工作中，我们提出了一种利用高斯混合模型（GMM）作为特征条件来指导去噪过程的条件机制。基于集合论，我们提供了一个全面的理论分析，表明基于特征和类的条件潜在分布是显著不同的，因此特征上的条件潜在分布比基于类的条件潜在分布产生更少的缺陷生成。分别训练了基于高斯混合模型的两个扩散模型进行比较。实验支持我们的发现。提出了一种新的梯度函数，称为负高斯混合梯度（NGMG），并将其应用于带有附加分类器的扩散模型训练中。训练稳定性提高了。我们还从理论上证明，当学习低维流形支持的分布时，NGMG作为一个更合理的成本函数，与推土机距离（Wasserstein）具有相同的优势。

摘要: Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.

标题: Wind speed super-resolution and validation: from ERA5 to CERRA via diffusion models

作者: Fabio Merizzi, Andrea Asperti, Stefano Colamonaco

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.15469v2

中文摘要: 哥白尼欧洲区域再分析，CERRA，是欧洲区域的高分辨率区域再分析数据集。近年来，它在各种与气候相关的任务中显示出重要的效用，从预测和气候变化研究到可再生能源预测、资源管理、空气质量风险评估和罕见事件的预测等。遗憾的是，由于获取必要的外部数据方面的限制以及生成CERRA所固有的大量计算需求，CERRA的可用性比目前的日期晚了两年。作为解决方案，本文介绍了一种新的方法，使用扩散模型以数据驱动的方式近似CERRA降尺度，而无需额外的信息。通过利用为CERRA提供边界条件的低分辨率ERA5数据集，我们将其作为超分辨率任务来处理。专注于意大利周围的风速，我们的模型，根据现有的CERRA数据训练，显示了有希望的结果，密切反映了原始的CERRA数据。现场观测的验证进一步证实了模型在近似地面测量方面的准确性。

摘要: The Copernicus Regional Reanalysis for Europe, CERRA, is a high-resolution regional reanalysis dataset for the European domain. In recent years it has shown significant utility across various climate-related tasks, ranging from forecasting and climate change research to renewable energy prediction, resource management, air quality risk assessment, and the forecasting of rare events, among others. Unfortunately, the availability of CERRA is lagging two years behind the current date, due to constraints in acquiring the requisite external data and the intensive computational demands inherent in its generation. As a solution, this paper introduces a novel method using diffusion models to approximate CERRA downscaling in a data-driven manner, without additional informations. By leveraging the lower resolution ERA5 dataset, which provides boundary conditions for CERRA, we approach this as a super-resolution task. Focusing on wind speed around Italy, our model, trained on existing CERRA data, shows promising results, closely mirroring original CERRA data. Validation with in-situ observations further confirms the model’s accuracy in approximating ground measurements.

标题: Generative Design of Crystal Structures by Point Cloud Representations and Diffusion Model

作者: Zhelin Li, Rami Mrad, Runxian Jiao

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.13192v2

中文摘要: 有效地产生能量稳定的晶体结构长期以来一直是材料设计中的一个挑战，这主要是由于晶格中原子的巨大排列。为了促进稳定材料的发现，我们提出了一个生成可合成材料的框架，利用点云表示来编码复杂的结构信息。这个框架的核心是引入一个扩散模型作为其基础支柱。为了衡量我们的方法的有效性，我们使用它从我们的训练数据集重建输入结构，严格验证其高重建性能。此外，我们通过产生全新的材料，强调它们的可合成性，展示了基于点云的晶体扩散（PCCD）的巨大潜力。我们的研究通过创成式设计的前沿途径，而不是传统的替代或基于经验的发现，为材料设计和合成的进步做出了显著的贡献。

摘要: Efficiently generating energetically stable crystal structures has long been a challenge in material design, primarily due to the immense arrangement of atoms in a crystal lattice. To facilitate the discovery of stable material, we present a framework for the generation of synthesizable materials, leveraging a point cloud representation to encode intricate structural information. At the heart of this framework lies the introduction of a diffusion model as its foundational pillar. To gauge the efficacy of our approach, we employ it to reconstruct input structures from our training datasets, rigorously validating its high reconstruction performance. Furthermore, we demonstrate the profound potential of Point Cloud-Based Crystal Diffusion (PCCD) by generating entirely new materials, emphasizing their synthesizability. Our research stands as a noteworthy contribution to the advancement of materials design and synthesis through the cutting-edge avenue of generative design instead of the conventional substitution or experience-based discovery.

== Visual Language Navigation ==

标题: SubPipe: A Submarine Pipeline Inspection Dataset for Segmentation and Visual-inertial Localization

作者: Olaya Álvarez-Tuñón, Luiza Ribeiro Marnet, László Antal

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17907v1

GitHub: https://github.com/remaro-network/SubPipe-dataset|

中文摘要: 本文介绍了SubPipe，这是一个用于SLAM、对象检测和图像分割的水下数据集。SubPipe已经使用由OceanScan MST运营的\gls{LAUV}进行了记录，并携带了一套传感器，包括两个摄像机、一个侧扫声纳和一个惯性导航系统以及其他传感器。AUV已经部署在管道检查环境中，海底管道部分被沙子覆盖。AUV的姿态地面真实值由导航传感器估计。侧扫声纳和RGB图像分别包括目标检测和分割注释。最先进的分割、对象检测和SLAM方法在SubPipe上进行了基准测试，以展示数据集在利用计算机视觉算法方面的挑战和机遇。据作者所知，这是第一个带注释的水下数据集，提供了真实的管道检查场景。数据集和实验可在https：//github.com/remaro-network/SubPipe-dataset

摘要: This paper presents SubPipe, an underwater dataset for SLAM, object detection, and image segmentation. SubPipe has been recorded using a \gls{LAUV}, operated by OceanScan MST, and carrying a sensor suite including two cameras, a side-scan sonar, and an inertial navigation system, among other sensors. The AUV has been deployed in a pipeline inspection environment with a submarine pipe partially covered by sand. The AUV’s pose ground truth is estimated from the navigation sensors. The side-scan sonar and RGB images include object detection and segmentation annotations, respectively. State-of-the-art segmentation, object detection, and SLAM methods are benchmarked on SubPipe to demonstrate the dataset’s challenges and opportunities for leveraging computer vision algorithms. To the authors’ knowledge, this is the first annotated underwater dataset providing a real pipeline inspection scenario. The dataset and experiments are publicly available online at https://github.com/remaro-network/SubPipe-dataset

标题: ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

作者: Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang

PubTime: 2024-01-24

Downlink: http://arxiv.org/abs/2401.13311v1

Project: https://con-textual.github.io/|

中文摘要: 人工智能的最新进展导致了大型多模态模型（LMM）的发展，这些模型能够处理复杂的任务，包括对图像中的文本和视觉内容进行联合推理（例如，在公共场所导航地图）。本文介绍了ConTextual，这是一个新颖的基准测试，包括明确设计的指令，用于评估LMMs执行上下文敏感的文本丰富的可视化推理的能力。上下文强调不同的真实世界场景（例如，时间阅读、导航、购物等），要求更深入地理解文本和视觉元素之间的交互。我们的发现揭示了表现最好的LMM、GPT-4V（ision）和使用人类评估的人类能力之间30.8%的显著性能差距，表明在上下文敏感的文本丰富的视觉推理方面有很大的改进空间。值得注意的是，虽然GPT-4V在模因和引用解释等抽象类别中表现出色，但其整体表现仍落后于人类。除了人工评估，我们还采用了使用GPT-4的自动评估指标，揭示了绩效差异的类似趋势。我们还在不同的视觉环境中进行细粒度的评估，并提供定性分析，为LMM设计的未来发展提供了一个强大的框架。https：//con-textual.github.io/

摘要: Recent advancements in AI have led to the development of large multimodal models (LMMs) capable of processing complex tasks involving joint reasoning over text and visual content in the image (e.g., navigating maps in public places). This paper introduces ConTextual, a novel benchmark comprising instructions designed explicitly to evaluate LMMs’ ability to perform context-sensitive text-rich visual reasoning. ConTextual emphasizes diverse real-world scenarios (e.g., time-reading, navigation, shopping and more) demanding a deeper understanding of the interactions between textual and visual elements. Our findings reveal a significant performance gap of 30.8% between the best-performing LMM, GPT-4V(ision), and human capabilities using human evaluation indicating substantial room for improvement in context-sensitive text-rich visual reasoning. Notably, while GPT-4V excelled in abstract categories like meme and quote interpretation, its overall performance still lagged behind humans. In addition to human evaluations, we also employed automatic evaluation metrics using GPT-4, uncovering similar trends in performance disparities. We also perform a fine-grained evaluation across diverse visual contexts and provide qualitative analysis which provides a robust framework for future advancements in the LMM design. https://con-textual.github.io/

标题: SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization

作者: Mingyang Li, Yue Ma, Qinru Qiu

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.13076v1

GitHub: https://github.com/Leomingyangli/SemanticSLAM|

中文摘要: 视觉同步定位和绘图（VSLAM）中的当前技术通过比较连续场景的图像特征来估计相机位移。这些算法依赖于场景的连续性，因此需要频繁的摄像机输入。然而，频繁处理图像会导致大量的内存使用和计算开销。在这项研究中，我们介绍了SemanticSLAM，这是一个端到端的视觉惯性里程计系统，它利用了从RGB-D传感器提取的语义特征。这种方法能够创建环境的语义图，并确保可靠的相机定位。SemanticSLAM是场景不可知的，这意味着它不需要针对不同的环境进行重新训练。它可以在室内环境中有效地工作，即使没有频繁的摄像机输入，也不需要事先知道。SemanticSLAM的优势在于它能够逐步细化语义图并改进姿态估计。这是通过卷积长短期记忆（ConvLSTM）网络实现的，该网络经过训练可以在地图构建过程中纠正错误。与现有的VSLAM算法相比，SemanticSLAM将姿态估计提高了17%。由此产生的语义图提供了关于环境的可解释信息，并且可以容易地应用于各种下游任务，例如路径规划、避障和机器人导航。该代码将在https：//github.com/Leomingyangli/SemanticSLAM

摘要: Current techniques in Visual Simultaneous Localization and Mapping (VSLAM) estimate camera displacement by comparing image features of consecutive scenes. These algorithms depend on scene continuity, hence requires frequent camera inputs. However, processing images frequently can lead to significant memory usage and computation overhead. In this study, we introduce SemanticSLAM, an end-to-end visual-inertial odometry system that utilizes semantic features extracted from an RGB-D sensor. This approach enables the creation of a semantic map of the environment and ensures reliable camera localization. SemanticSLAM is scene-agnostic, which means it doesn’t require retraining for different environments. It operates effectively in indoor settings, even with infrequent camera input, without prior knowledge. The strength of SemanticSLAM lies in its ability to gradually refine the semantic map and improve pose estimation. This is achieved by a convolutional long-short-term-memory (ConvLSTM) network, trained to correct errors during map construction. Compared to existing VSLAM algorithms, SemanticSLAM improves pose estimation by 17%. The resulting semantic map provides interpretable information about the environment and can be easily applied to various downstream tasks, such as path planning, obstacle avoidance, and robot navigation. The code will be publicly available at https://github.com/Leomingyangli/SemanticSLAM

标题: ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

作者: Dong An, Hanqing Wang, Wenguan Wang

PubTime: 2024-01-22

Downlink: http://arxiv.org/abs/2304.03047v3

GitHub: https://github.com/MarSaKi/ETPNav.|https://github.com/MarSaKi/ETPNav|

中文摘要: Vision-language导航是一项需要代理按照指令在环境中导航的任务。它在具体化人工智能领域变得越来越重要，在自主导航、搜索和救援以及人机交互方面具有潜在的应用。在本文中，我们提出了一个更实际但具有挑战性的对应设置——连续环境中的视觉语言导航（VLN-CE）。为了开发一个鲁棒的VLN-CE代理，我们提出了一个新的导航框架ETPNav，它专注于两个关键技能：1）抽象环境和生成远程导航计划的能力，以及2）在连续环境中的避障控制能力。ETPNav通过沿着穿越路径自组织预测的航路点来执行环境的在线拓扑映射，而无需先前的环境经验。它赋予代理将导航过程分解为高级规划和低级控制的特权。同时，ETPNav利用基于Transformer model的跨模态规划器来基于拓扑图和指令生成导航计划。然后，该计划通过避障控制器来执行，该控制器利用试错法来防止导航陷入障碍物。实验结果证明了该方法的有效性。ETPNav的产量超过10%和20%的改进比以前的属性R2R-CE和RxR-CE数据集的最新技术。我们的代码可在https：//github.com/MarSaKi/ETPNav。获得

摘要: Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. It becomes increasingly crucial in the field of embodied AI, with potential applications in autonomous navigation, search and rescue, and human-robot interaction. In this paper, we propose to address a more practical yet challenging counterpart setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav performs online topological mapping of environments by self-organizing predicted waypoints along a traversed path, without prior environmental experience. It privileges the agent to break down the navigation procedure into high-level planning and low-level control. Concurrently, ETPNav utilizes a transformer-based cross-modal planner to generate navigation plans based on topological maps and instructions. The plan is then performed through an obstacle-avoiding controller that leverages a trial-and-error heuristic to prevent navigation from getting stuck in obstacles. Experimental results demonstrate the effectiveness of the proposed method. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets, respectively. Our code is available at https://github.com/MarSaKi/ETPNav.

标题: Multimotion Visual Odometry (MVO)

作者: Kevin M. Judd, Jonathan D. Gammell

PubTime: 2024-01-15

Downlink: http://arxiv.org/abs/2110.15169v3

Project: https://www.youtube.com/watch?v=mNj3s1nf-6A|https://www.youtube.com/playlist?list=PLbaQBz4TuPcxMIXKh5Q80s0N9ISezFcpi|

中文摘要: 视觉运动估计是自主导航中一个研究得很好的挑战。最近的工作集中于解决高度动态环境中的多运动估计问题。这些环境不仅包括多个复杂的运动，而且往往表现出显著的遮挡。很难同时估计第三方运动和传感器自运动，因为物体的观测运动包括其真实运动和传感器运动。先前在多运动估计中的大多数工作通过依赖于基于外观的对象检测或特定于应用程序的运动约束来简化这个问题。这些方法在特定的应用程序和环境中是有效的，但不能很好地推广到完整的多运动估计问题（MEP）。本文介绍了Multimotion Visual Odometry（MVO），这是一种多运动估计管道，它估计场景中每个运动的完整SE（3）轨迹，包括传感器自身运动，而不依赖于基于外观的信息。MVO通过多运动分割和跟踪技术扩展了传统的视觉里程计（VO）管道。它使用物理建立的运动先验来推断通过临时遮挡的运动，并通过运动闭合来识别运动的再现。对牛津多运动数据集（OMD）和KITTI Vision Benchmark Suite的真实世界数据的评估表明，与类似方法相比，MVO实现了良好的估计精度，并适用于各种多运动估计挑战

摘要: Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object’s observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

标题: Learning Interactive Real-World Simulators

作者: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour

PubTime: 2024-01-13

Downlink: http://arxiv.org/abs/2310.06114v2

Project: https://universal-simulator.github.io.|https://universal-simulator.github.io|

中文摘要: 基于互联网数据训练的生成模型彻底改变了文本、图像和视频内容的创建方式。也许生成模型的下一个里程碑是模拟现实体验，以响应人类、机器人和其他交互式代理所采取的行动。真实世界模拟器的应用范围从游戏和电影中的可控内容创建，到纯粹在模拟中训练可直接部署在现实世界中的具体代理。我们探索了通过生成建模学习真实世界交互的通用模拟器的可能性。我们首先提出了一个重要的观察结果，即可用于学习真实世界模拟器的自然数据集通常在不同维度上是丰富的（例如，图像数据中的大量对象、机器人数据中的密集采样动作以及导航数据中的不同运动）。通过仔细编排不同的数据集，每个数据集都提供了整体体验的不同方面，我们可以从静态场景和对象中模拟高级指令（如“打开抽屉”）和低级控件（如“按x，y移动”）的视觉结果。我们使用模拟器来训练高级视觉语言策略和低级强化学习策略，在纯模拟训练后，每一种策略都可以在现实世界中零次部署。我们还表明，其他类型的智能，如视频字幕模型，可以从模拟经验的训练中受益，从而开辟更广泛的应用。视频演示可在https://universal-simulator.github.io.

摘要: Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world. We explore the possibility of learning a universal simulator of real-world interaction through generative modeling. We first make the important observation that natural datasets available for learning a real-world simulator are often rich along different dimensions (e.g., abundant objects in image data, densely sampled actions in robotics data, and diverse movements in navigation data). With careful orchestration of diverse datasets, each providing a different aspect of the overall experience, we can simulate the visual outcome of both high-level instructions such as ``open the drawer’’ and low-level controls such as “move by x, y” from otherwise static scenes and objects. We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies, each of which can be deployed in the real world in zero shot after training purely in simulation. We also show that other types of intelligence such as video captioning models can benefit from training with simulated experience, opening up even wider applications. Video demos can be found at https://universal-simulator.github.io.

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

为了答谢各位网友的支持，从今日起免费为300名读者提供订阅主题论文服务，只需VX关注公号并回复{邮箱+论文主题}（如：[email protected] + chatgpt@large language model @LLM）,主题必须是同一个领域，最多三个关键词。解释权归博主所有

你可能感兴趣的:(每日论文,机器人,深度学习,人工智能,大模型,学习)

【Python】一文详细介绍 py格式文件高斯小哥 Python基础【高质量合集】python 新手入门学习
【Python】一文详细介绍py格式文件个人主页：高斯小哥高质量专栏：Matplotlib之旅：零基础精通数据可视化、Python基础【高质量合集】、PyTorch零基础入门教程希望得到您的订阅和支持~创作高质量博文(平均质量分92+)，分享更多关于深度学习、PyTorch、Python领域的优质内容！（希望得到您的关注~）文章目录一、py格式文件简介二、如何创建和编辑py格式文件三、如何运行py
大学播音主持都学什么内容？播音主持专业学什么？配音新手圈
有些喜欢播音主持并且犹豫要不要报考这个大学专业的小伙伴们就会想要了解大学播音主持都学什么内容吧，毕竟如果不够了解就直接选择这个专业真的等选择完进去学习以后才知道这个专业并不是自己想要学习的东西那就来不及了。下面是小编为大家整理出来的一些播音主持专业学习的内容，请往下看吧。大学播音主持专业主要学习的课程有：播音发声、播音创作基础、广播播音主持、电视播音主持、文艺作品演播学概论、新闻学概论、新闻采编、
数据结构奇妙旅程之深入解析快速排序山间漫步人生路数据结构排序算法算法
快速排序（QuickSort）是一种高效的排序算法，它使用了分治法的策略来将一个数组排序。其基本思想是选择一个基准元素，通过一趟排序将待排序的数据分割成独立的两部分，其中一部分的所有数据都比基准元素小，另一部分的所有数据都比基准元素大，然后再按此方法对这两部分数据分别进行快速排序，整个排序过程可以递归进行，以此达到整个数据变成有序序列。工作原理选择基准：从待排序的序列中选一个元素作为基准（pivo
新网师的精神肤色（幕布笔记）悦读书香
王子老师的《极简100小妙招》收到已经几天了，之前大概的浏览了全书，今天起给自己定了一个计划，必须每天学习极简小妙招里面的一个妙招，并加以运用。一、今天要打卡什么内容因有完成每天学习极简小妙招的计划，所以今天晚饭吃的比较简单，草草吃完以后带着小宝到广场溜达一圈，急忙赶回来学习极简小妙招。再重看的时候不知道自己要学点什么，打卡哪一招，感觉哪个都简单，就看这一环节像王子老师说的“一看就会”，但做这一环
学习JavaEE的日子 Day32 线程池 A 北枝学习JavaEE 学习 java-ee java 线程池
Day32线程池1.引入一个线程完成一项任务所需时间为：创建线程时间-Time1线程中执行任务的时间-Time2销毁线程时间-Time32.为什么需要线程池(重要)线程池技术正是关注如何缩短或调整Time1和Time3的时间，从而提高程序的性能。项目中可以把Time1，T3分别安排在项目的启动和结束的时间段或者一些空闲的时间段线程池不仅调整Time1，Time3产生的时间段，而且它还显著减少了创建
llama.cpp 编译安装@Ubuntu skywalk8163 项目实践人工智能 llama ubuntu linux 人工智能
在Kylin和Ubuntu编译llama.cpp，具体参考：llama模型c语言推理@FreeBSD-CSDN博客现在代码并编译：gitclonehttps://github.com/ggerganov/llama.cppcdllama.cppmkdirbuildcdbuildcmake..cmake--build.--configRelease#可选安装makeinstall#或可选添加路径ex
没有如释重负君远近
虽然只有短短的一个多月的努力复习时间，但今天的整个考试经过，还是发现了效果的，题目做的比较自如，没有慌里慌张，而且提前五分钟完成。至于考试成绩，没有实足的把握，60分都不敢保证。但绝对相信自己，比去年肯定要好！今天早早的赶到考场，见到了刘老师，谈起来学习情况，坦率的说，真的是自己不够重视。总以为会很难，没有信心。其实不是的，只要认真对待，树立足够的信心，绝对可以通过考试的。还向老师询问了，后续再报
C++学习笔记（lambda函数） __TAT__ C&C++c++学习笔记
C++learningnote1、lambda函数的语法2、lambda函数的几种用法1、lambda函数的语法lambda函数的一般语法如下：[capture_clause](parameters)->return_type{function_body}capture_clause：需要捕获的变量，但要求该变量必须在这个作用域中。通常的捕获方式有以下几种：[]：不捕获任何变量[&]：按引用捕获变
CSV指南：Python程序获取大型CSV文件行数孤独打铁匠Julian 笔记经验分享 python
本指南提供了几种使用Python来获取大型CSV文件行数的方法，并解释了每种方法的适用场景。方法1:使用csv.reader处理复杂CSV文件当你的CSV文件中包含多行字段（即某些字段的值中包含换行符）时，使用csv.reader是一个可靠的选择，因为它能够正确处理这些复杂情况。这个方法适用于大多数大小的CSV文件，但是对于非常大的文件，读取整个文件可能会占用较多的时间和内存。对于极大的文件，考虑
心赏（2018.10.8）六一节_3928
1.上班第一天，同事彤休完产假，回来上班，给我带了酸奶和水果。她生小孩时，我给她发了一个小红包贺喜，哪知她就记在心里了。心赏这个有心的90后。2.女儿放学回来，说自己当了小组长。一边说不想当，一边得意的样子。心赏老师给了孩子这个锻炼的机会。3.老妈今天做了"蚂蚁上树"的菜，得到女儿的高度肯定。心赏老妈还在不断学习。
keras.optimizers优化器中文文档地上悬河 python 开发语言后端
优化器optimizers优化器是编译Keras模型必要的两个参数之一model=Sequential()model.add(Dense(64,init='uniform',input_dim=10))model.add(Activation('tanh'))model.add(Activation('softmax'))sgd=SGD(lr=0.01,decay=1e-6,momentum=0.
2021.12.13 自律日记夏舒帅然a
深感时光转瞬即逝，如指缝流金！自律、习惯养成、执行力提高迫在眉睫！今天是什么日子：平日艳阳天起床：7：50任务清单（明日）1.起床：7:302.就寝：10：103.读书30分钟4.打两套太极5.两次静坐（每次15分钟）昨日完成的任务情况，最重要的三件事一.读书30分。未完成二.就寝10：00完成三.起床7：30未完成四.打两套太极未完成五.两次静坐（每次15分钟）未完成习惯养成：早睡早起、每日读书
ChatGPT一路狂飙？何鲸洛
2月2日。根据投行瑞银集团在周三发布的一份研究报告。爆红聊天机器人ChatGPT的月活跃用户在今年1月份预计达到了1亿，这距离它推出只有2个月时间，成为史上增长最快的消费者应用。①ChatGPT一路火花带闪电？▽2014年。OpenAI创始人SamAltman早年曾执掌著名的硅谷孵化器YCombinator。2015年。Altman联合马斯克、彼得·泰尔、AWS、印度Infosys和YC等作为出资
2019.11.28感恩日记 afab5b74f713
1.感谢真我守护，一觉到天明，谢谢谢谢谢谢！2.感谢一大早，橘子就甩来4800的大红包，谢谢谢谢谢谢！3.感谢今天代理宝宝们疯狂加单，钱宝宝流入小十万，太牛了你们，有你们真好，谢谢谢谢谢谢！4.感谢自己拥有钱宝宝，可以去群里给宝宝们发红包，表达我的爱，谢谢谢谢谢谢钱宝宝爱我！5.感谢自己的细胞宝宝们，让我保持健康与活力，可以自由活动，活力满满，谢谢谢谢谢谢！6.感谢芬姐甩来订单，谢谢谢谢谢谢钱宝宝
买莆田鞋的app软件，三大app莆田鞋平台推荐给大家腕表鞋屋
买莆田鞋的app软件，三大app莆田鞋平台推荐给大家，如毒app、亚马逊、潮鞋之家、鞋子货源app、淘宝等app都非常的好用，还有更多的可以购买莆田鞋子，莆田鞋在哪个app买好用，下面一起看看。微信:pt188x(下单赠送精美礼品)买莆田鞋的三大app软件：一、淘宝app。买莆田鞋当然少不了淘宝，建议大家不要直接去搜索莆田鞋，那样给出的结果是很少的。大家看上哪款鞋子的型号直接去搜索就可以了，然后按
2022-2-13晨间日记越亮也打烊
今天是什么日子起床：7:00就寝：12:08天气：晴心情：糟糕纪念日：无任务清单昨日完成的任务，最重要的三件事：寒假作业，网课，画画改进：作业时间剪短习惯养成：网课不逃～周目标·完成进度数学卷子100％学习·信息·阅读《傅雷家书》《钢铁是怎样炼成的》健康·饮食·锻炼我终于不喝饮料啦，喝茶～人际·家人·朋友邝姐姐带我吃火锅工作·思考啥时候开学，我还有几天赶完作业最美好的三件事1.卷子写完了2.我有冰
中原焦点团队38期王芳芳坚持分享第236天，20230630总约练134次，来访113次，咨8次，观察员13次芳芳王
学习焦点的初心是想拯救孩子，孩子由于沉迷游戏，成绩下滑，在学习的过程中发现是自己的教育方式出了状况。经过半年的学习，一些焦点的基本技巧，如接纳、欣赏、倾听、同理心、尊重等都有了一定的了解。但在实际应用时仍然存在很多问题，感觉自己仍然没有放下对孩子成绩的期望，仍然把握不住对孩子管理的度。我该如何去陪伴好孩子？多用心去听课，并加强反思，多约练。去思考如何让自己快乐起来？
中国大学：你站起来！立恒语文
我们先来看看中国大学对外国留学生的“奇葩”待遇。近日，有网友曝出吉林大学有要求中国学生起床后须叫醒外国留学生的服务。看完之后，真是让人大跌眼镜。有网友就直接质问：吉大是大学，还是酒店？中国学生是学生，还是服务员？外国留学生是来求学的，还是享受的？这不仅让人联想到最近一段时间以来网上频频曝出的许多中国大学对外国留学生的一些“奇葩”待遇，这里举几个比较有名的事例，以飨读者。1.山东大学的“三陪”制度，
00后的我和你们三七_f4f4
大部分人认为，这个社会压力最大的莫过于90后。可能上有老下有小，可以正在被催婚。工作压力大。可是也有大部分00后也步入了社会，比起90后，他们更是迷茫，不知所措。虽没有来自家庭的压力，没有来自催婚的烦劳。可迷茫真的很可怕，不知道一会该干嘛，该想那些方面发展。觉得自己以后就这样碌碌无为了吗？就这样过一辈子吗？又不甘。图片发自App前几天在抖音上看见一个视频，他说姚明在苦练篮球。谁谁在苦练什么。问，你
大创项目推荐深度学习 opencv python 公式识别(图像识别机器视觉) laafeer python
文章目录0前言1课题说明2效果展示3具体实现4关键代码实现5算法综合效果6最后0前言优质竞赛项目系列，今天要分享的是基于深度学习的数学公式识别算法实现该项目较为新颖，适合作为竞赛课题方向，学长非常推荐！学长这里给一个题目综合评分(每项满分5分)难度系数：3分工作量：4分创新点：4分更多资料,项目分享：https://gitee.com/dancheng-senior/postgraduate1课题
#D174-读书会作业-《财务自由之路》3 白洲笔记
最近沉迷于写作营，一直就没时间去弄读书会的作业，书的第二遍也就看了个开头，趁着日更的时间，赶紧把作业做了，这次是15到21课。【1.印象最深刻的部分】(本周所读内容中印象最深刻的部分)*活在未来，最正确的方法是什么？用正确的方法做正确的事情，判断什么是正确的？逻辑。学会思考。"作对事情"永远比“把事情作对“重要的多。”长远思考，耐心验证，小心总结提炼“证明自己正确并不是学习的任务和目标，时刻成长，
新注册的阿里云账号有哪些优惠？阿里云新用户必看优惠大合集阿里云最新优惠和活动汇总
很多用户看到阿里云各种活动中的云服务器、云数据库、企业邮箱等云产品都仅限新用户购买之后，都纷纷直接注册了阿里云新账号之后购买，其实，阿里云新用户不仅可以优惠购买活动中的各种云产品，还有很多优惠，下面是“阿里云最新优惠和活动汇总”整理汇总的阿里云新用户必看优惠大合集。新注册的阿里云账号在购买活动中的云产品之前，还有免费领云产品通用代金券、抽取无门槛代金券、免费试用云服务器和正式购买云服务器等阿里云产
大伟荐语5.10 求索大伟
【大伟荐语】我未曾见过一个早起、勤奋、谨慎、诚实的人抱怨命运不好，良好的品格，优良的习惯，坚强的意志，是不会被假设所谓的命运打败的。——富兰克林遐思：外部再强大、再厉害的敌人，经过艰苦卓绝的斗争就能打败它；而内部的敌人——自己，实是自己最大的敌人，惟有战胜自我方能傲然屹立不倒。自己身上的懒惰、拖延、萎靡，则是阻碍自己前行最大的障碍，而一步一步地把自身的消极因素一个个祛除，树立一个积极、主动、勇敢的
账务处理又出错？资深会计来教你，学会效率翻倍！共同学习小橘子要努力吖
作为一名会计，在实际工作中会遇到各种麻烦的账务处理问题。那么，最常用的会计处理方法都有哪些呢？今天小编为大家带来了从业二十六年的资深老会计分享的十四中会计常用的账务处理问题的解决方案，快来看看吧！一、促销品的账务处理在促销时公司经常会把一些商品按进价赠送给消费者使用二、款已付清但发票未到的账务处理三、购买材料发生不合理损耗的账务处理问题公司在购买材料时，常常会发生一些不合理的损耗，那么这种问题该怎
【真诚子】通晓鬼谷第七篇读书日记。真诚子l通晓鬼谷
今天把个人品牌，从193读到208页，书的内容质量出奇的高，尤其是这一段。对标学习法，找一个比自己强，或者你期望成为的人进行模仿性学习，对标学习，不是到处，去找人对标兵学习很多人的优点，或是学习自己认为好的方面，而是找准一个对标高手，然后全方位的学习这个人。我在做品牌咨询时就对标，学习了一个在国内很有名的行业顶尖大咖。我先找到他公司的方案，进行完全模仿，连PPT的排版都一样，而且我只参照他一个人的
ES-LTR粗排模块 poins jenkins 运维
ES-LTR粗排模块官方资源：https://github.com/HeiBoWang/elasticsearch-learning-to-rankElasticsearch学习排名插件使用机器学习提高搜索相关性排名。它为维基媒体基金会和Snagajob等地方的搜索提供了动力！这个插件有什么功能此插件：允许您在Elasticsearch中存储特征（Elasticsearch查询模板）记录特征得分（
2018-11-18成长小组学习笔记实验中学45
因为嗓子“罢工”，我面对众人只能借“微笑”代言。在开始授课前，绣霞老师先反馈上次作业的情况，提到“接纳”需是真正发自内心的完全接纳，而不是口头上的接纳，内心却是排斥的。提到一个“问题”孩子恰恰对家爱的更加“深沉”，夫妻间的问题不能影响到孩子，对孩子更好的爱不是你为他做的更多，而是给他自由、健康成长的空间。图片发自App一、孩子：家庭的一面镜子夫妻成了彼此的“投射”，婚姻便“吵的不可开交”，婚姻便成
Ai插件脚本合集安装包，免费教程视频网盘分享全网优惠分享君
随着人工智能技术的不断发展，越来越多的插件脚本涌现出来，为我们的生活和工作带来了便利。然而，如何快速、方便地获取和使用这些插件脚本呢？今天，我将为大家分享一个非常实用的资源——AI插件脚本合集安装包，以及免费教程视频网盘分享。首先，让我们来了解一下这个AI插件脚本合集安装包。它是一个集合了众多AI插件脚本的资源包，涵盖了各种领域，如数据分析、自动化办公、智能客服等等。通过这个安装包，用户可以轻松地
剧本杀【幕后玩家】复盘解析+凶手是谁+剧透结局+测评+怎么玩？ VX搜_彤彤速递
每天持续更新复盘有15000＋：线下剧本杀·百变大侦探·我是谜·谁是凶手·玩吧·剧本杀线上·戏精大侦探·魔王杀·儿童剧本杀...所有谜题在等着你去揭开。为了你获得更好的游戏体验，本文仅显示《幕后玩家》剧本杀部分真相复盘，获取完整真相复盘只需两步①【微信关注公众号：云云复盘】②回复【幕后玩家】即可查看获取哦贾友仁利用自己保险公司的职务，在杨光审车时，隐瞒了车子存在刹车不灵的问题。想让杜若出车祸死亡，
过去一年，这16本好书不容错过 m0_54050778 perl
编者按：2023年在动荡与希望中收尾，2023年注定会被载入史册。疫情寒冬结束，ChatGPT横空出世，带动了人工智能技术的飞速发展；淄博烧烤、天津大爷、尔滨之旅等充满感动与幸福。但与此同时，2023年又是动荡与不安的一年，俄乌冲突的延宕，新一轮的巴以冲突，极端天气频发。在这个大环境下，有一些经典的书籍著作诞生。本文将分享2023年最值得一读的16本书籍，文章来自翻译，希望对你有所启示。关于202
html 周华华 html
js 1，数组的排列 var arr=[1,4,234,43,52,]; for(var x=0;x<arr.length;x++){ for(var y=x-1;y<arr.length;y++){ if(arr[x]<arr[y]){ &
【Struts2 四】Struts2拦截器 bit1129 struts2拦截器
Struts2框架是基于拦截器实现的，可以对某个Action进行拦截，然后某些逻辑处理，拦截器相当于AOP里面的环绕通知，即在Action方法的执行之前和之后根据需要添加相应的逻辑。事实上，即使struts.xml没有任何关于拦截器的配置，Struts2也会为我们添加一组默认的拦截器，最常见的是，请求参数自动绑定到Action对应的字段上。 Struts2中自定义拦截器的步骤是：
make:cc 命令未找到解决方法 daizj linux 命令未知 make cc
安装rz sz程序时，报下面错误： [root@slave2 src]# make posix cc -O -DPOSIX -DMD=2 rz.c -o rz make: cc：命令未找到 make: *** [posix] 错误 127 系统：centos 6.6 环境：虚拟机错误原因：系统未安装gcc，这个是由于在安
Oracle之Job应用周凡杨 oracle job
最近写服务，服务上线后，需要写一个定时执行的SQL脚本，清理并更新数据库表里的数据，应用到了Oracle 的 Job的相关知识。在此总结一下。一：查看相关job信息 1、相关视图 dba_jobs all_jobs user_jobs dba_jobs_running 包含正在运行
多线程机制朱辉辉33 多线程
转至http://blog.csdn.net/lj70024/archive/2010/04/06/5455790.aspx 程序、进程和线程：程序是一段静态的代码，它是应用程序执行的蓝本。进程是程序的一次动态执行过程，它对应了从代码加载、执行至执行完毕的一个完整过程，这个过程也是进程本身从产生、发展至消亡的过程。线程是比进程更小的单位，一个进程执行过程中可以产生多个线程，每个线程有自身的
web报表工具FineReport使用中遇到的常见报错及解决办法（一）老A不折腾 web报表 finereport java报表报表工具
FineReport使用中遇到的常见报错及解决办法（一）这里写点抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、address pool is full：含义：地址池满，连接数超过并发数上
mysql rpm安装后没有my.cnf 林鹤霄没有my.cnf
Linux下用rpm包安装的MySQL是不会安装/etc/my.cnf文件的，至于为什么没有这个文件而MySQL却也能正常启动和作用，在这儿有两个说法，第一种说法，my.cnf只是MySQL启动时的一个参数文件，可以没有它，这时MySQL会用内置的默认参数启动，第二种说法，MySQL在启动时自动使用/usr/share/mysql目录下的my-medium.cnf文件，这种说法仅限于r
Kindle Fire HDX root并安装谷歌服务框架之后仍无法登陆谷歌账号的问题 aigo root
原文：http://kindlefireforkid.com/how-to-setup-a-google-account-on-amazon-fire-tablet/ Step 4: Run ADB command from your PC On the PC, you need install Amazon Fire ADB driver and instal
javascript 中var提升的典型实例 alxw4616 JavaScript
// 刚刚在书上看到的一个小问题,很有意思.大家一起思考下吧 myname = 'global'; var fn = function () { console.log(myname); // undefined var myname = 'local'; console.log(myname); // local }; fn() // 上述代码实际上等同于以下代码 m
定时器和获取时间的使用百合不是茶时间的转换定时器
定时器:定时创建任务在游戏设计的时候用的比较多 Timer();定时器 TImerTask();Timer的子类由 Timer 安排为一次执行或重复执行的任务。定时器类Timer在java.util包中。使用时，先实例化，然后使用实例的schedule(TimerTask task, long delay)方法，设定
JDK1.5 Queue bijian1013 java thread java多线程 Queue
JDK1.5 Queue LinkedList： LinkedList不是同步的。如果多个线程同时访问列表，而其中至少一个线程从结构上修改了该列表，则它必须保持外部同步。（结构修改指添加或删除一个或多个元素的任何操作；仅设置元素的值不是结构修改。）这一般通过对自然封装该列表的对象进行同步操作来完成。如果不存在这样的对象，则应该使用 Collections.synchronizedList 方
http认证原理和https bijian1013 http https
一.基础介绍在URL前加https://前缀表明是用SSL加密的。你的电脑与服务器之间收发的信息传输将更加安全。 Web服务器启用SSL需要获得一个服务器证书并将该证书与要使用SSL的服务器绑定。 http和https使用的是完全不同的连接方式，用的端口也不一样,前者是80，后
【Java范型五】范型继承 bit1129 java
定义如下一个抽象的范型类，其中定义了两个范型参数，T1，T2 package com.tom.lang.generics; public abstract class SuperGenerics<T1, T2> { private T1 t1; private T2 t2; public abstract void doIt(T
【Nginx六】nginx.conf常用指令(Directive) bit1129 Directive
1. worker_processes 8; 表示Nginx将启动8个工作者进程，通过ps -ef|grep nginx,会发现有8个Nginx Worker Process在运行 nobody 53879 118449 0 Apr22 ? 00:26:15 nginx: worker process
lua 遍历Header头部 ronin47 lua header 遍历　
local headers = ngx.req.get_headers() ngx.say("headers begin", "<br/>") ngx.say("Host : ", he
java-32.通过交换a,b中的元素，使[序列a元素的和]与[序列b元素的和]之间的差最小(两数组的差最小)。 bylijinnan java
import java.util.Arrays; public class MinSumASumB { /** * Q32.有两个序列a,b，大小都为n,序列元素的值任意整数，无序. * * 要求：通过交换a,b中的元素，使[序列a元素的和]与[序列b元素的和]之间的差最小。 * 例如: * int[] a = {100,99,98,1,2,3
redis 开窍的石头 redis
在redis的redis.conf配置文件中找到# requirepass foobared 把它替换成requirepass 12356789 后边的12356789就是你的密码打开redis客户端输入config get requirepass 返回 redis 127.0.0.1:6379> config get requirepass 1) "require
[JAVA图像与图形]现有的GPU架构支持JAVA语言吗？ comsci java语言
无论是opengl还是cuda，都是建立在C语言体系架构基础上的，在未来，图像图形处理业务快速发展，相关领域市场不断扩大的情况下，我们JAVA语言系统怎么从这么庞大，且还在不断扩大的市场上分到一块蛋糕，是值得每个JAVAER认真思考和行动的事情
安装ubuntu14.04登录后花屏了怎么办 cuiyadll ubuntu
这个情况，一般属于显卡驱动问题。可以先尝试安装显卡的官方闭源驱动。按键盘三个键：CTRL + ALT + F1 进入终端，输入用户名和密码登录终端：安装amd的显卡驱动 sudo apt-get install fglrx 安装nvidia显卡驱动 sudo ap
SSL 与数字证书的基本概念和工作原理 darrenzhu 加密 ssl 证书密钥签名
SSL 与数字证书的基本概念和工作原理 http://www.linuxde.net/2012/03/8301.html SSL握手协议的目的是或最终结果是让客户端和服务器拥有一个共同的密钥，握手协议本身是基于非对称加密机制的，之后就使用共同的密钥基于对称加密机制进行信息交换。 http://www.ibm.com/developerworks/cn/webspher
Ubuntu设置ip的步骤 dcj3sjt126com ubuntu
在单位的一台机器完全装了Ubuntu Server，但回家只能在XP上VM一个，装的时候网卡是DHCP的，用ifconfig查了一下ip是192.168.92.128,可以ping通。转载不是错： Ubuntu命令行修改网络配置方法 /etc/network/interfaces打开后里面可设置DHCP或手动设置静态ip。前面auto eth0，让网卡开机自动挂载. 1. 以D
php包管理工具推荐 dcj3sjt126com PHP Composer
http://www.phpcomposer.com/ Composer是 PHP 用来管理依赖（dependency）关系的工具。你可以在自己的项目中声明所依赖的外部工具库（libraries），Composer 会帮你安装这些依赖的库文件。中文文档入门指南下载安装包列表 Composer 中国镜像
Gson使用四（TypeAdapter） eksliang json gson Gson自定义转换器 gsonTypeAdapter
转载请出自出处：http://eksliang.iteye.com/blog/2175595 一.概述 Gson的TypeAapter可以理解成自定义序列化和返序列化二、应用场景举例例如我们通常去注册时（那些外国网站），会让我们输入firstName，lastName,但是转到我们都
JQM控件之Navbar和Tabs gundumw100 html xml css
在JQM中使用导航栏Navbar是简单的。只需要将data-role="navbar"赋给div即可： <div data-role="navbar"> <ul> <li><a href="#" class="ui-btn-active&qu
利用归并排序算法对大文件进行排序 iwindyforest java 归并排序大文件分治法 Merge sort
归并排序算法介绍，请参照Wikipeida zh.wikipedia.org/wiki/%E5%BD%92%E5%B9%B6%E6%8E%92%E5%BA%8F 基本思想：大文件分割成行数相等的两个子文件，递归（归并排序）两个子文件，直到递归到分割成的子文件低于限制行数低于限制行数的子文件直接排序两个排序好的子文件归并到父文件直到最后所有排序好的父文件归并到输入
iOS UIWebView URL拦截啸笑天 UIWebView
本文译者：candeladiao，原文：URL filtering for UIWebView on the iPhone说明：译者在做app开发时，因为页面的javascript文件比较大导致加载速度很慢，所以想把javascript文件打包在app里，当UIWebView需要加载该脚本时就从app本地读取，但UIWebView并不支持加载本地资源。最后从下文中找到了解决方法，第一次翻译，难免有
索引的碎片整理SQL语句 macroli sql
SET NOCOUNT ON DECLARE @tablename VARCHAR (128) DECLARE @execstr VARCHAR (255) DECLARE @objectid INT DECLARE @indexid INT DECLARE @frag DECIMAL DECLARE @maxfrag DECIMAL --设置最大允许的碎片数量,超过则对索引进行碎片
Angularjs同步操作http请求with $promise qiaolevip 每天进步一点点学习永无止境 AngularJS 纵观千象
// Define a factory app.factory('profilePromise', ['$q', 'AccountService', function($q, AccountService) { var deferred = $q.defer(); AccountService.getProfile().then(function(res) {
hibernate联合查询问题 sxj19881213 sql Hibernate HQL 联合查询
最近在用hibernate做项目，遇到了联合查询的问题，以及联合查询中的N+1问题。针对无外键关联的联合查询，我做了HQL和SQL的实验，希望能帮助到大家。（我使用的版本是hibernate3.3.2） 1 几个常识：（1）hql中的几种join查询，只有在外键关联、并且作了相应配置时才能使用。（2）hql的默认查询策略，在进行联合查询时，会产
struts2.xml wuai struts
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configuration 2.3//EN" "http://struts.apache