单细胞数据整合-解决AnnData合并时ValueError: cannot reindex from a duplicate axis问题

项目场景:

使用scanpy包进行单细胞数据分析时,往往需要整合多个样本的数据,也就是将多个AnnData对象合并为一个AnnData对象。

例如将adata_1和adata_2合并为adatas,可行的一种方法是:

import anndata as ad
adatas=[adata_1,adata_2]
adatas = ad.concat(adatas, merge = "same")

这样得到的adatas将adata_1和adata_2中的obs合并,且保留两个数据中相同的var_key

(AnnData.concat()函数用法详见 AnnData Concatenation文档)


问题描述:

在一次合并中,出现报错:“ValueError: cannot reindex from a duplicate axis”

完整报错代码如下:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_10619/2071572554.py in <module>
----> 1 adatas = ad.concat(adata, merge = "same")

~/gpfs1/xuzk/Anaconda/envs/py3/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    836 
    837     # Annotation for other axis
--> 838     alt_annot = merge_dataframes(
    839         [getattr(a, alt_dim) for a in adatas], alt_indices, merge
    840     )

~/gpfs1/xuzk/Anaconda/envs/py3/lib/python3.8/site-packages/anndata/_core/merge.py in merge_dataframes(dfs, new_index, merge_strategy)
    548     dfs: Iterable[pd.DataFrame], new_index, merge_strategy=merge_unique
    549 ) -> pd.DataFrame:
--> 550     dfs = [df.reindex(index=new_index) for df in dfs]
    551     # New dataframe with all shared data
    552     new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)

~/gpfs1/xuzk

你可能感兴趣的:(单细胞数据分析,python)