无网络环境下配置并运行 word2vec复现.py

需运行文件

# -*- coding: utf-8 -*-
import torch
import pandas as pd
import jieba
import torch
import torch.nn as nn
from tqdm import tqdm
from torch.utils.data import DataLoader,Dataset
from transformers import AutoTokenizer,AutoModel

def get_stop_word():
    with open("../data/baidu_stopwords.txt",encoding="utf-8") as f:
        return f.read().split("\n")

def read_data(n=3):
    import jieba.posseg as psg
    # with open("../data/数学原始数据.csv",encoding="gbk") as f:
    all_data = pd.read_csv("../data/数学原始数据.csv",names=["data"],encoding="gbk") #
    all_data = all_data["data"].tolist()

    no_t = ["x","c","m","d","uj","r",""]

    result = []

    word_fre = {}

    for data in all_data:
        words = psg.lcut(data)

        new_word = []
        for word,t in words:
            if t in no_t:
                continue

            if word not in stop_words:
                word_fre[word] = word_fre.get(word,0) + 1
                new_word.append(word)

        result.append(new_word)

    new_result = []

    for words in result:
        new_word = []
        for word in words:

            if word_fre[word]

创建和激活虚拟环境(可选)

python3 -m venv word2vec_offline
source word2vec_offline/bin/activate

安装依赖

pip install torch pandas jieba tqdm transformers

 下载依赖的离线安装包

在有网络的机器上,执行:

mkdir offline_pkgs
pip download torch pandas jieba tqdm transformers -d offline_pkgs

这样会把所有依赖包(包括依赖的依赖)下载到 offline_pkgs 文件夹。

拷贝依赖和项目文件到无网络环境

  • 拷贝 offline_pkgs 文件夹到无网络环境
  • 拷贝你的 word2vec复现.py 以及所需的 ../data/、../model/ 文件夹

3. 在无网络环境下新建虚拟环境

python3 -m venv venv
source venv/bin/activate

4. 离线安装依赖

进入 offline_pkgs 文件夹,执行:

pip install --no-index --find-links=offline_pkgs torch pandas jieba tqdm transformers

如果有依赖报错,先安装报错的依赖,再装主包。

5. 检查依赖安装

pip list

确认 torch、pandas、jieba、tqdm、transformers 都已安装。

6. 运行你的代码

确保你在虚拟环境中,且数据和模型路径正确:

python word2vec复现.py

你可能感兴趣的:(python,linux,开发语言)