【Superset3.0】更全面superset相关--配置邮件报告发送: 附件乱码以及导出文件(截屏图片)中文乱码问题

安装到配置邮件报告发送: 附件乱码以及导出文件、截屏图片中文乱码问题

  • 一、配置邮件发送
  • 二、容器或者系统配置中文环境/语言
          • 如下所示,查看当前环境
          • 1. 首先,给apt换源
          • 2. 安装对应字符集
            • 安装字符集:
          • 3. 安装对应中文字体
  • 三、解决导出CSV 中文乱码
  • 四、解决附件名中文乱码
          • 其他superset有用的参考博文

由于之前是第一次是接触superset,从0到成功投入使用,踩了太多的坑,特整理分享给大家!下面是亲测且全网最有效的一些方法!

一、配置邮件发送

superset版本3.0
安装方式: 官方+Dockerfie构建docker镜像,具体安装方法

#---------------------------------------------------------

# Superset specific config

#---------------------------------------------------------

# 查询行数限制
ROW_LIMIT = 200000

SUPERSET_WORKERS = 8

SUPERSET_WEBSERVER_PORT = 8088

#---------------------------------------------------------

#---------------------------------------------------------

# Flask App Builder configuration

#---------------------------------------------------------

# Your App secret key

SECRET_KEY = 'ZT2uRVAMPKpVkHM/QA1QiQlMuUgAi7LLo160AHA99aihEjp03m1HR6Kg'

# The SQLAlchemy connection string to your database backend

# This connection defines the path to the database that stores your

# superset metadata (slices, connections, tables, dashboards, ...).

# Note that the connection information to connect to the datasources

# you want to explore are managed directly in the web UI

#SQLALCHEMY_DATABASE_URI = 'sqlite:var/lib/superset/superset.db'
#白名单,csrf拦截
# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = False
TALISMAN_ENABLED = False 
# WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
#WTF_CSRF_EXEMPT_LIST = [
#    "superset.views.core.log",
#    "superset.views.core.explore_json",
#    "superset.charts.data.api.data",
#]
# 配置数据库
SQLALCHEMY_DATABASE_URI = 'mysql://数据库用户名:密码@172.17.0.2/superset?charset=utf8'

# Set this API key to enable Mapbox visualizations

MAPBOX_API_KEY = 'ZT2uRVAMPKpVkHM/QA1QiQ'
# 语言,汉化
# Setup default language
BABEL_DEFAULT_LOCALE = 'zh'
# Your application default translation path
BABEL_DEFAULT_FOLDER = 'superset/translations'
# The allowed translation for you app
LANGUAGES = {
    'en': {'flag': 'us', 'name': 'English'},
    'zh': {'flag': 'cn', 'name': 'Chinese'},
}

from dateutil import tz
# 时区
DRUID_TZ = tz.gettz('Asia/Shanghai')



#csv导出编码
CSV_EXPORT = {
  'encoding': 'gbk'
}
EXCEL_EXPORT = {
  "encoding": "gbk"
}

# Email configuration
ENABLE_SCHEDULED_EMAIL_REPORTS = True

EMAIL_NOTIFICATIONS = True
 
#SMTP_HOST = "smtp.qq.com"
#SMTP_STARTTLS = True
#SMTP_SSL = False
#SMTP_SSL_SERVER_AUTH = True
#SMTP_USER = "[email protected]"
#SMTP_PORT = 25
#SMTP_PASSWORD = "授权码"
#SMTP_MAIL_FROM = "[email protected]"

SMTP_HOST = "smtp.163.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_SSL_SERVER_AUTH = True
SMTP_USER = "[email protected]"
SMTP_PORT = 25
SMTP_PASSWORD = "授权码"
SMTP_MAIL_FROM = "[email protected]"

# 匿名访问
PUBLIC_ROLE_LIKE_GAMMA = True

#from superset.typing import CacheConfig
from celery.schedules import crontab
 
FEATURE_FLAGS = {
    # 缩略图
    "THUMBNAILS": True,
    "THUMBNAILS_SQLA_LISTENERS": True,
    # 交叉过滤
    "DASHBOARD_NATIVE_FILTERS": True,
    "DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤
    "DASHBOARD_NATIVE_FILTERS_SET": True, # 过滤集
    # 开启导入、导出
    "VERSIONED_EXPORT": True,
    # 面板缓存
    "DASHBOARD_CACHE": True,
    # 警告和报告
    "ALERT_REPORTS": True,
	# 全局异步查询
    "GLOBAL_ASYNC_QUERIES": True,
    # 动态插件
    "DYNAMIC_PLUGINS": True,
    "LISTVIEWS_DEFAULT_CARD_VIEW": True,  # 开启卡片显示
    "ENABLE_REACT_CRUD_VIEWS": True, # 为所有FAB视图(列表、编辑、显示)启用替换React视图
    "ENABLE_TEMPLATE_PROCESSING": True, # 开启jinja模板
   # "ENABLE_EXPLORE_DRAG_AND_DROP": False,# 开启拖放查询面板
	"DISABLE_DATASET_SOURCE_EDIT": True,  # 禁用数据集编辑,变为只读模式
}


# redis缓存配置 
CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 168, # 7天
    'CACHE_KEY_PREFIX': 'superset_',
    'CACHE_REDIS_HOST': 'localhost',
    'CACHE_REDIS_PORT': 6379,
    'CACHE_REDIS_DB': 0,
    'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
 
DATA_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 48, # 2day (1 day default (in secs))
    'CACHE_KEY_PREFIX': 'superset_results',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
 
# Async selenium thumbnail task will use the following user
#THUMBNAIL_SELENIUM_USER = "admin"
#from superset.tasks.types import ExecutorType


# 发邮件截图身份
EMAIL_REPORTS_USER = "admin"

# 缩略图身份
THUMBNAIL_SELENIUM_USER = 'admin'
# 缩略图存放配置
THUMBNAIL_CACHE_CONFIG = {	
    'CACHE_TYPE': 'redis',
    'CACHE_DEFAULT_TIMEOUT': 120*60*60,
    'CACHE_KEY_PREFIX': 'thumbnail_',
    'CACHE_NO_NULL_WARNING': True,
    'CACHE_REDIS_URL': 'redis://localhost:6379/0'

}

#CacheConfig = {
#    'CACHE_TYPE': 'redis',
#    'CACHE_DEFAULT_TIMEOUT': 3*60*60,
#    'CACHE_KEY_PREFIX': 'thumbnail_',
#    'CACHE_NO_NULL_WARNING': True,
#    'CACHE_REDIS_URL': 'redis://localhost:6379/0'
#}


# celery配置
REDIS_HOST = "localhost"
REDIS_PORT = "6379"
class CeleryConfig:
    broker_url = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
    imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
    result_backend = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
    worker_prefetch_multiplier = 10
    task_acks_late = True
    task_annotations = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 600,
            'soft_time_limit': 600,
            'ignore_result': True,
        },
    }
    beat_schedule = {
        'reports.scheduler': {
            'task': 'reports.scheduler',
            'schedule': crontab(minute='*', hour='*'),
        },
        'reports.prune_log': {
            'task': 'reports.prune_log',
            'schedule': crontab(minute=0, hour=0),
        },
    }

CELERY_CONFIG = CeleryConfig
 
SCREENSHOT_LOCATE_WAIT = 200
SCREENSHOT_LOAD_WAIT = 600



# Slack configuration
SLACK_API_TOKEN = "xoxb-"
# The webdriver to use for generating reports. Use one of the following
#nohup celery --app=superset.tasks.celery_app:app worker --pool=gevent --concurrency=100  >celery.log &
# firefox
#   Requires: geckodriver and firefox installations
#   Limitations: can be buggy at times
# chrome:
#   Requires: headless chrome
#   Limitations: unable to generate screenshots of elements
WEBDRIVER_TYPE = "chrome"
 
# Additional args to be passed as arguments to the config object
# Note: these options are Chrome-specific. For FF, these should
# only include the "--headless" arg
WEBDRIVER_OPTION_ARGS = [
    "--force-device-scale-factor=2.0",
    "--high-dpi-support=2.0",
    "--headless",
    "--no-sandbox",
    "--disable-gpu",
    "--disable-dev-shm-usage",
    "--disable-setuid-sandbox",
    "--disable-extensions",

]



# The base URL to query for accessing the user interface
# WEBDRIVER_BASEURL = "http://192.168.1.100:8088/"
 # This is for internal use, you can keep http
WEBDRIVER_BASEURL="http://localhost:8088/"
# This is the link sent to the recipient, change to your domain eg. https://superset.mydomain.com
WEBDRIVER_BASEURL_USER_FRIENDLY="http://10.2.67.218:8089"

# On Redis
from cachelib.redis import RedisCache
RESULTS_BACKEND = RedisCache(
    host='localhost', port=6379, key_prefix='superset_results')

# 如果配置全局异步查询,重新添加为多余32位
#"GLOBAL_ASYNC_QUERIES": True
# 全局异步查询秘钥
GLOBAL_ASYNC_QUERIES_JWT_SECRET = "test-secret-change-me-new-key-added-later"


# 自定jinja内容
JINJA_CONTEXT_ADDONS = {
        'my_fun': lambda x:x*2
        }






SCHEDULED_QUERIES = {
    # This information is collected when the user clicks "Schedule query",
    # and saved into the `extra` field of saved queries.
    # See: https://github.com/mozilla-services/react-jsonschema-form
    'JSONSCHEMA': {
        'title': 'Schedule',
        'description': (
            'In order to schedule a query, you need to specify when it '
            'should start running, when it should stop running, and how '
            'often it should run. You can also optionally specify '
            'dependencies that should be met before the query is '
            'executed. Please read the documentation for best practices '
            'and more information on how to specify dependencies.'
        ),
        'type': 'object',
        'properties': {
            'output_table': {
                'type': 'string',
                'title': 'Output table name',
            },
            'start_date': {
                'type': 'string',
                'title': 'Start date',
                # date-time is parsed using the chrono library, see
                # https://www.npmjs.com/package/chrono-node#usage
                'format': 'date-time',
                'default': 'tomorrow at 9am',
            },
            'end_date': {
                'type': 'string',
                'title': 'End date',
                # date-time is parsed using the chrono library, see
                # https://www.npmjs.com/package/chrono-node#usage
                'format': 'date-time',
                'default': '9am in 30 days',
            },
            'schedule_interval': {
                'type': 'string',
                'title': 'Schedule interval',
            },
            'dependencies': {
                'type': 'array',
                'title': 'Dependencies',
                'items': {
                    'type': 'string',
                },
            },
        },
    },
    'UISCHEMA': {
        'schedule_interval': {
            'ui:placeholder': '@daily, @weekly, etc.',
        },
        'dependencies': {
            'ui:help': (
                'Check the documentation for the correct format when '
                'defining dependencies.'
            ),
        },
    },
    'VALIDATION': [
        # ensure that start_date <= end_date
        {
            'name': 'less_equal',
            'arguments': ['start_date', 'end_date'],
            'message': 'End date cannot be before start date',
            # this is where the error message is shown
            'container': 'end_date',
        },
    ],
    # link to the scheduler; this example links to an Airflow pipeline
    # that uses the query id and the output table as its name
    'linkback': (
        'https://airflow.example.com/admin/airflow/tree?'
        'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
    ),
}




主要上面的邮箱配置处!

在这里插入代码片

之后可能会出现几个问题,一个是发送csv文件时的图表标题需要为英文,否则邮件会发送但没有附件;另一个时发送png图片时,中文字符为方框无法展示,又或者附件里中文乱码。

二、容器或者系统配置中文环境/语言

中文是方框,是因为没有配置容器中文环境

如下所示,查看当前环境
root@ca04d687222c:~# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8

lang需要设置为 zh_CN.utf8 才能使浏览器正常显示中文

首先,确认locale 是否已安装对应的字符集:

root@ca04d687222c:~# locale -a
C
C.UTF-8
POSIX

由于容器默认是精简版,所以没有安装,而要在容器中安装对应的字符集!

1. 首先,给apt换源

查看linux发行版本

cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Debian GNU/Linux 12 (bookworm)版本,该版本下默认安装工具是apt,如果不换源,将体验b/s的下载速度,非常慢!

所以vim 打开/etc/apt/sources.list,替换为如下bookworm版本的源,如选错镜像,apt会各种冲突,什么都装不了。修改完源后,需要apt update一下,让源生效:

vim /etc/apt/sources.list

deb http://mirrors.163.com/debian/ bookworm main non-free contrib
deb http://mirrors.163.com/debian/ bookworm-updates main non-free contrib
deb http://mirrors.163.com/debian/ bookworm-backports main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm-updates main non-free contrib
deb-src http://mirrors.163.com/debian/ bookworm-backports main non-free contrib
deb http://mirrors.163.com/debian-security/ bookworm/updates main non-free contrib
deb-src http://mirrors.163.com/debian-security/ bookworm/updates main non-free contrib

update
update 是同步 /etc/apt/sources.list 和 /etc/apt/sources.list.d 中列出的源的索引,这样才能获取到最新的软件包。
upgrade
upgrade 是升级已安装的所有软件包,升级之后的版本就是本地索引里的,因此,在执行 upgrade 之前一定要执行 update, 这样才能是最新的。
An update should always be performed before an upgrade or dist-upgrade.
upgrade is used to install the newest versions of all packages currently installed on the system from the sources enumerated in /etc/apt/sources.list. Packages currently installed with new versions available are retrieved and upgraded.

2. 安装对应字符集

首先,安装locals

apt install locales
安装字符集:
dpkg-reconfigure locales

选择zh_CN.UTF-8 UTF-8的前面序号,回车确认!!
安装完成后需要启用,用时久一点。

在etc/default/locale配置文件里进行添加:

LANG=zh_CN.UTF-8

但是发现,每次进入容器都得执行 source etc/default/locale,非常麻烦,所以修改,找到/root/.bashrc

vim /root/.bashrc

# 添加
source /etc/default/locale
3. 安装对应中文字体
#apt-get install xfonts-intl-chinese
#apt-get install xfonts-intl-chinese wqy*

更新字体库

fc-cache -fv

使用 fc-list 指令可以查看安装的字体

三、解决导出CSV 中文乱码

superset 导出CSV 默认编码为utf-8,在导出包含中文的文本CSV会有乱码问题,解决方案如下:

在安装目录下执行grep -r “superset/config.py” ./

修改 superset/config.py,将UTF-8修改成GBK,如下所示:

CSV_EXPORT = {
'encoding': 'gbk',
}

修改后重启,如果还有问题,继续执行第二步:
(以下修改,superset3.0不需要,其他不清楚)

grep -r "superset/views/core.py" ./

修改superset/views/core.py

csv = df.to_csv(index=False, **config.get('CSV_EXPORT'))
 
response = CsvResponse(csv, mimetype='text/csv')

将Response 改成CsvResponse

四、解决附件名中文乱码

需要修改源代码!
修改superset/utils/core.py 882行(版本不同,行数可能变化):

# Attach any files passed directly
for name, body in (data or {}).items():
    att = MIMEApplication(
            body,
            Name=name,
            )
    att.add_header('Content-Disposition', 'attachment',filename=('gbk', '', name))
    msg.attach(att)
    
    '''
    msg.attach(
        MIMEApplication(
            body, Content_Disposition="attachment; filename='%s'" % name, Name=name
        )
    )
    '''
# Attach files by reading them from disk
for fname in files or []:
    basename = os.path.basename(fname)
    with open(fname, "rb") as f:
        att = MIMEApplication(
                f.read(),
                Name=basename
            )
        att.add_header('Content-Disposition', 'attachment',filename=('gbk', '', basename))
        msg.attach(att)
        '''
        msg.attach(
            MIMEApplication(
                f.read(),
                Content_Disposition="attachment; filename='%s'" % basename,
                Name=basename,
            )
        )
        '''

重启即可!

其他superset有用的参考博文
  • Superset安装与汉化
  • superset警报与报告
  • superset缩略图
  • 旧版本superset 完全汉化
  • SuperSet logo修改、导出csv中文乱码、sql查询超时问题解决(默认30s)问题处理
  • 更改superset用户logo
  • Jinja模板功能及设置动态参数

你可能感兴趣的:(编程笔记(问题解决),数据分析,部署&Linux运维,数据库,数据分析,superset,BI)