February 13, 2021

Including Additional Files With Python Packages

Including Additional Files With  Python Packages

A few weeks back I outlined the process for building and distributing a Python package using setuptools, however, that process only distributes the Python code needed to run your package. In this post, I will go over what is involved in distributing non-Python files with a python package.

Default Behavior

Let's take a look at my example project redditmon that was created for the previous post. If you recall, at the end of the packaging process, the file structure of the git repo looked like this:

redditmon
├── redditmon
│   ├── __init__.py
│   └── redditmon.py
├── .gitignore
├── example_redditmon.config.ini
├── LICENSE
├── README.md
└── setup.py

Currently everything on our package directory is a python file, so when we build our package using python3 setup.py sdist bdist_wheel, setuptools generates the following files and puts them at the same level as our setup.py file:

redditmon
├── build
│   ├── bdist.linux-x86_64
│   └── lib
│       └── redditmon
│           ├── __init__.py
│           └── redditmon.py
└── setup.py
NOTE: The setup.py file wasn't re-generated, it is just included here to keep directory context.

Adding Non-Python Files

So far so good, right? Well what happens if we have a non-Python file we want to include with our project? A great example of this for network automation would be if we needed to include some TextFSM files that our script relies upon, or if we had some text/csv data needed for processing. For demonstration purposes, I have added a file called IncludeMe.md to the repository. Here's what our repo looks like now:

redditmon
├── redditmon
│   ├── IncludeMe.md
│   ├── __init__.py
│   └── redditmon.py
├── .gitignore
├── example_redditmon.config.ini
├── LICENSE
├── README.md
└── setup.py

So what happens when we re-run python3 setup.py sdist bdist_wheel and look inside the build directory?

redditmon
├── build
│   ├── bdist.linux-x86_64
│   └── lib
│       └── redditmon
│           ├── __init__.py
│           └── redditmon.py
└── setup.py

Hmmm. Everything looks the same as before. What gives?

Using package_data

To include IncludeMe.md, all we have to do is modify our setup.py file to tell setuptools to add it to the package. This is pretty simple, we just have to add this dictionary to the package_data parameter of setuptools.setup():

setuptools.setup(
    <SNIP>
    package_data={'redditmon': ['IncludeMe.md']},
)

Now when we re-run python3 setup.py sdist bdist_wheel, look what happens:

redditmon
├── build
│   ├── bdist.linux-x86_64
│   └── lib
│       └── redditmon
│           ├── __init__.py
│           ├── IncludeMe.md
│           └── redditmon.py
└── setup.py

Extra Knobs and Details

Nice! Now that we know that it works, let's go over some of the details of the dictionary.

First up is the key. This is the package name that we want to include the files in. This is pretty simple for a single-package library like redditmon, however, if you have a multi-package library with sub-packages, this follows the same standard dotted input nomenclature that you use when importing a library. This can also be left empty to apply to all packages. More on that in a bit.

Since this is a dictionary, we of course need a value to go with our key. Python's setuptools expects a list here, which is simply a list of strings representing the relative file path to the files to include with the package. The path is relative to the package given in the dictionary key. The fact that the path is relative to the package doesn't matter too much in a simple project (other than being a bit confusing at first) but it becomes very important in more complex projects—take this package structure for example:

example_library
├── pkg1
│   ├── sub_pkg1
│   │   ├── __init__.py
│   │   ├── DoNotIncludeMe.txt
│   │   ├── IncludeMe.md
│   │   ├── IncludeMe2.md
│   │   └── sub_pkg1.py
│   ├── __init__.py
│   ├── IncludeMe.md
│   └── pkg1.py
├── pkg2
│   ├── __init__.py
│   ├── file1.foo
│   ├── file2.bar
│   └── pkg2.py
└── setup.py

Here we have multiple packages and a sub-package, each one with various files that we want to include and some that we don't want to include. This setup.py file demonstrates all of the was that we can include (and choose to not include) files in various ways:

import setuptools

setuptools.setup(
    name="example_library",
    version="1.0.0",
    packages=setuptools.find_packages(),
    package_data={
        'pkg1.sub_pkg1': ['*.md'],
        'pkg1': ['IncludeMe.md'],
        'pkg2': ['file1.foo', 'file2.bar']
        },
    python_requires='>=3.6',
)

Running setuptools again results in this:

build
├── bdist.linux-x86_64
└── lib
    ├── pkg1
    │   ├── sub_pkg1
    │   │   ├── __init__.py
    │   │   ├── IncludeMe.md
    │   │   ├── IncludeMe2.md
    │   │   └── sub_pkg1.py
    │   ├── __init__.py
    │   ├── IncludeMe.md
    │   └── pkg1.py
    └── pkg2
        ├── __init__.py
        ├── file1.foo
        ├── file2.bar
        └── pkg2.py

Let's break down all of the things we added to package_data. First off, we include all of the .md files in sub_pkg1 with the 'pkg1.sub_pkg1': ['*.md'] key:value pair. By specifying the *.md glob pattern, we matched all files ending in .md in pkg1.sub_pkg1. Since the glob pattern doesn't match DoNotIncludeMe.txt, that file is left out of our package. For pkg1, we only specify IncludeMe.md, and for pkg2, we name multiple files (via list entries) to be included.

What if I want to include everything? Easy! Just specify package_data={'': ['*'],}, for package data. This adds all file types that live in all packages, resulting in this:

.
├── bdist.linux-x86_64
└── lib
    ├── pkg1
    │   ├── sub_pkg1
    │   │   ├── __init__.py
    │   │   ├── DoNotIncludeMe.txt
    │   │   ├── IncludeMe.md
    │   │   ├── IncludeMe2.md
    │   │   └── sub_pkg1.py
    │   ├── __init__.py
    │   ├── IncludeMe.md
    │   └── pkg1.py
    └── pkg2
        ├── __init__.py
        ├── file1.foo
        ├── file2.bar
        └── pkg2.py

Hmmm. What if we want to only include all .md files in all packages?  Simple enough with package_data={'': ['*.md'],},, which gives us:

.
├── bdist.linux-x86_64
└── lib
    ├── pkg1
    │   ├── sub_pkg1
    │   │   ├── __init__.py
    │   │   ├── IncludeMe.md
    │   │   ├── IncludeMe2.md
    │   │   └── sub_pkg1.py
    │   ├── __init__.py
    │   ├── IncludeMe.md
    │   └── pkg1.py
    └── pkg2
        ├── __init__.py
        └── pkg2.py

Pretty neat, right? But wait, there's more! There is also the exclude_package_data parameter. This... pretty much does what it says on the tin—It works just like package_data except anything that you specify will be excluded, taking precedence over package_data. Given this setup.py file:

import setuptools

setuptools.setup(
    name="example_library",
    version="1.0.0",
    packages=setuptools.find_packages(),
    package_data={'': ['*'],},
    exclude_package_data={'': ['*.txt', '*.bar'],},
    python_requires='>=3.6',
)

These files are included:

build
├── bdist.linux-x86_64
└── lib
    ├── pkg1
    │   ├── sub_pkg1
    │   │   ├── __init__.py
    │   │   ├── IncludeMe.md
    │   │   ├── IncludeMe2.md
    │   │   └── sub_pkg1.py
    │   ├── __init__.py
    │   ├── IncludeMe.md
    │   └── pkg1.py
    └── pkg2
        ├── __init__.py
        ├── file1.foo
        └── pkg2.py

Final Thoughts

Hopefully you find this useful if you are packaging a Python project. I found this to be a bit confusing at first and fought with it for longer than I am proud to admit. If you want to read more on this subject, check out the documentation.