请问,“glew”是一个RL工程师常用的工具库吗?
请问, this codebase 主要是做什么用的呀?
是否可以请您根据 this codebase 的主要功能,参考PyTorch的文档格式和文档
风格,使用Markdown格式为选中的代码行编写一段相应的文档说明呢?
conda create -p ~/work/data/envs/text2reward python=3.7
conda activate /home/featurize/work/data/envs/text2reward
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 \
torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -e ManiSkill2
pip install stable-baselines3==1.8.0 wandb tensorboard \
-i https://pypi.tuna.tsinghua.edu.cn/simple
pip install langchain chromadb==0.4.0 \
-i https://pypi.tuna.tsinghua.edu.cn/simple
conda create -n text2reward_codegen python=3.8
pip install transformers
OPENAI_API_KEY
请参考博文《【Text2Reward】多工作区编辑》
错误信息如下:
>>> import mujoco_py
~/anaconda3/envs/text2reward/lib/python3.7/site-packages/mujoco_py/gl/eglshim.c:4:10: fatal error: GL/glew.h: 没有那个文件或目录
4 | #include
| ^~~~~~~~~~~
compilation terminated.
Traceback (most recent call last):
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/setuptools/_distutils/unixccompiler.py", line 186, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/setuptools/_distutils/ccompiler.py", line 1007, in spawn
spawn(cmd, dry_run=self.dry_run, **kwargs)
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/setuptools/_distutils/spawn.py", line 71, in spawn
"command {!r} failed with exit code {}".format(cmd, exitcode)
distutils.errors.DistutilsExecError: command '/usr/bin/gcc' failed with exit code 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "" , line 1, in <module>
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/mujoco_py/__init__.py", line 2, in <module>
from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
... # 中间代码省略
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/setuptools/_distutils/unixccompiler.py", line 188, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command '/usr/bin/gcc' failed with exit code 1
由错误信息可以看出,这里其实是缺少GL工具,安装一下即可;
Gemini 2.0 Flash:GLEW(OpenGL Extension Wrangler Library)是一个OpenGL扩展管理库,用来管理和使用 OpenGL的新功能。
Note:
glew
可以尝试用conda安装。
可以参考官方命令;
也可以尝试使用以下命令安装:
sudo apt-get install libglew-dev libosmesa6-dev
错误信息如下:
>>> import mujoco_py
Traceback (most recent call last):
File "" , line 1, in <module>
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/__init__.py", line 2, in <module>
from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/builder.py", line 504, in <module>
cymj = load_cython_ext(mujoco_path)
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/builder.py", line 110, in load_cython_ext
cext_so_path = builder.build()
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/builder.py", line 226, in build
built_so_file_path = self._build_impl()
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/builder.py", line 297, in _build_impl
fix_shared_library(so_file_path, 'libOpenGL.so', 'libOpenGL.so.0')
File "~/anaconda3/envs/myenv/lib/python3.7/site-packages/mujoco_py/builder.py", line 154, in fix_shared_library
subprocess.check_call(['patchelf', '--remove-rpath', so_file])
File "~/anaconda3/envs/myenv/lib/python3.7/subprocess.py", line 358, in check_call
retcode = call(*popenargs, **kwargs)
File "~/anaconda3/envs/myenv/lib/python3.7/subprocess.py", line 339, in call
with Popen(*popenargs, **kwargs) as p:
File "~/anaconda3/envs/myenv/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "~/anaconda3/envs/myenv/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'patchelf': 'patchelf'
这里其实就是提示缺少patchelf
工具,安装一下即可;
Note:
patchelf
可以用conda来安装。
海螺AI: 在这里,
patchelf
是一个用于修改 ELF 可执行文件和库的实用工具。在你的错误信息中,MuJoCo在构建时使用patchelf
来修改共享库(主要是动态链接库.so
文件)的运行时路径(rpath)。
原因是程序在运行时被外部中断(Ctrl+C),Wandb未能正常结束;
# 以下类似的报错信息会重复出现6次...
wandb: Using wandb-core as the SDK backend. Please refer to [W&B documentation](https://docs.wandb.ai/) for more information.
wandb: Currently logged in as: songyuc. Use `wandb login --relogin` to force relogin
wandb: ERROR failed to upsert bucket: returned error 403 Forbidden: {"errors":[{"message":"permission denied","path":["upsertBucket"],"extensions":{"code":"PERMISSION_ERROR"}}],"data":{"upsertBucket":null}}
Traceback (most recent call last):
File "sac.py", line 132, in <module>
settings=None if args.reward_path is None else wandb.Settings(code_dir=args.reward_path[:-11]))
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 1270, in init
wandb._sentry.reraise(e)
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/wandb/analytics/sentry.py", line 161, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 1256, in init
return wi.init()
File "~/anaconda3/envs/text2reward/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 847, in init
raise error
wandb.errors.errors.CommError: failed to upsert bucket: returned error 403 Forbidden: {"errors":[{"message":"permission denied","path":["upsertBucket"],"extensions":{"code":"PERMISSION_ERROR"}}],"data":{"upsertBucket":null}}
这里主要的错误是“PERMISSION_ERROR”,实际上是因为我们无法访问“xlang-ai/text2reward”官方团队的entity(code4reward
),所以会报错;
...
-----------------------------------
| rollout/ | |
| ep_len_mean | 200 |
| ep_rew_mean | -1300.2421 |
| time/ | |
| episodes | 40000 |
| fps | 144 |
| time_elapsed | 55330 |
| total_timesteps | 8000000 |
-----------------------------------
WARNING - mani_skill2 is not installed with git.
Traceback (most recent call last):
File "sac.py", line 181, in <module>
success = np.array(ep_lens) < eval_env.env.env._max_episode_steps
File "/stable_baselines3/common/vec_env/base_vec_env.py" , line 313, in __getattr__
return self.getattr_recursive(name)
File "/stable_baselines3/common/vec_env/base_vec_env.py" , line 338, in getattr_recursive
attr = getattr(self.venv, name)
AttributeError: 'SubprocVecEnv' object has no attribute 'env'
...
Traceback (most recent call last):
File "fewshotexp.py", line 65, in <module>
codegenerator FewShotGenerator(
File "~/Documents/Research/Robotstudy/text2reward/codegeneration/singleflow/fewshot/generation.py", line 30, in init
self.exampleselector SemanticSimilarityExampleSelector.fromexamples(
File "~/anaconda3/envs/text2rewardcodegen/lib/python3.8/site-packages/langchaincore/exampleselectors/semanticsimilarity.py", line 170, in fromexamples
vectorstore vectorstorecls.fromtexts(
File "~/anaconda3/envs/text2rewardcodegen/lib/python3.8/site-packages/openai/baseclient.py", line 1027, in request
return self.retryrequest(
File "~/anaconda3/envs/text2rewardcodegen/lib/python3.8/site-packages/openai/baseclient.py", line 1105, in retryrequest
return self.request(
File "~/anaconda3/envs/text2rewardcodegen/lib/python3.8/site-packages/openai/baseclient.py", line 1037, in request
raise APIConnectionError(requestrequest) from err
openai.APIConnectionError: Connection error.
specific.py
脚本无法使用compute_dense_reward(self, action) -> float
提取为顶层函数;AttributeError
,可能是因为代码引用了不存在的机器人参数,这时需要重新运行生成奖励函数可能是没有设置代理的问题;
自定义密集奖励函数的文件路径。
当提供此参数时,将用指定文件中的compute_dense_reward
函数覆盖环境默认的奖励计算。
给连续控制任务添加最大步数限制及自定义稠密奖励计算的包装器。
step
与 reset
接口。reset()
返回的初始观测。reset(self) -> np.ndarray
_elapsed_steps=0
,并调用底层环境的 reset()
,保存并返回初始观测 pre_obs
。compute_dense_reward(self, action: np.ndarray) -> float
exec()
注入实现。step(self, action: np.ndarray) -> (obs, reward, done, info)
compute_dense_reward
,则返回其计算结果,否则使用环境原始奖励。True
(并在 info["TimeLimit.truncated"]=True
),否则 False
。"TimeLimit.truncated"
(bool):是否因步数上限而终止。import gym
from run_maniskill.ppo import ContinuousTaskWrapper
# 创建原始环境
base_env = gym.make("LiftCube-v0",
obs_mode="state",
reward_mode="dense",
control_mode="pd_ee_delta_pose")
# 包装,设置最大 100 步后强制结束
env = ContinuousTaskWrapper(base_env, max_episode_steps=100)
# 如果需要使用自定义稠密奖励函数,请在脚本中这样注入:
# with open("my_reward.py") as f:
# code = f.read()
# namespace = {}
# exec(code, namespace)
# ContinuousTaskWrapper.compute_dense_reward = namespace['compute_dense_reward']
obs = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if done:
break
这样就能在保持环境接口兼容的同时,方便地加入自定义的稠密奖励逻辑,并对 episode 长度进行统一管理。
Zero-Shot 奖励函数代码生成实验脚本。该模块使用大型语言模型(如GPT4)来自动为不同的机器人操作任务生成奖励函数代码,无需额外的示例数据。
LiftCube-v0
:抓取立方体并举起到指定高度PickCube-v0
:抓取立方体并移动到目标位置StackCube-v0
:抓取立方体 A 并堆叠到立方体 B 上TurnFaucet-v0
:转动水龙头手柄OpenCabinetDoor-v1
:使用单臂移动机器人打开柜门OpenCabinetDrawer-v1
:使用单臂移动机器人打开抽屉PushChair-v1
:使用双臂移动机器人推动旋转椅LiftCube_Env
self.cubeA
:LiftCube中不存在特定的“cubeA”,因为只有一个cube;
self.cubeB
:同上
RigidObject
:不存在的虚构类型。
ArticulateObject
:不存在的虚构类型。
LinkObject
:不存在的虚构类型。
PandaRobot
:不存在的虚构类型。
现在我们发现了这个代码库中虚构类型的完整清单:
RigidObject
- 虚构ObjectPose
- 虚构通过LLM零样本生成机器人强化学习任务的奖励函数代码。该类支持多种语言模型后端,包括 OpenAI GPT 系列模型和开源的 Code Llama、Llama 2 系列模型。
class ZeroShotGenerator:
def __init__(self, info_prompt: PromptTemplate, model_name="gpt-4", **kwargs) -> None
info_prompt (PromptTemplate
):
model_name (str
, 可选):
"gpt-4"
"gpt-3.5-turbo"
, "gpt-3.5-turbo-0613"
, "gpt-4"
, "gpt-4-0314"
, "gpt-4-0613"
"codellama_34b"
, "llama_2_70b"
kwargs (dict
, 可选):
generate_code(instruction: str, map_dict: dict) -> Tuple[str, str]
核心代码生成方法,根据任务指令生成对应的奖励函数代码。
参数:
str
): 自然语言描述的机器人任务指令,例如 “pick up the red cube and place it on the table”dict
): 通用术语到特定术语的映射字典,用于将生成的通用代码转换为特定环境的代码返回值:
功能说明:
python` 和
标记)RewardFunctionConverter
将通用代码转换为特定环境的实现通过预定义的处理链(chain)执行指令驱动的生成任务。
参数: instruction (str) [definition]
自然语言格式的任务指令,描述需要生成的输出要求。
from langchain.prompts import PromptTemplate
from code_generation.single_flow.zero_shot.generation import ZeroShotGenerator
# 创建提示模板
prompt_template = PromptTemplate(
input_variables=["instruction"],
template="Generate a reward function for: {instruction}"
)
# 初始化生成器
generator = ZeroShotGenerator(
info_prompt=prompt_template,
model_name="gpt-4",
temperature=0.1
)
# 定义映射字典
map_dict = {
"robot": "self.robot",
"target_object": "self.cube",
"goal_position": "self.goal_pose"
}
# 生成奖励函数代码
instruction = "move the cube to the target position"
general_code, specific_code = generator.generate_code(instruction, map_dict)
print("通用代码:")
print(general_code)
print("\n特定代码:")
print(specific_code)
该类通过条件分支支持不同的模型后端:
ChatOpenAI
包装器,支持官方 APIHuggingFaceLLM
包装器,支持本地部署的模型代码生成过程包含鲁棒的错误处理:
生成的代码经过 RewardFunctionConverter
处理:
info_prompt
包含足够的上下文信息以指导模型生成有效代码map_dict
应该包含从通用术语到具体环境术语的完整映射