Implementing new file formats

If you are interested in adding support for a new file format, please create a new issue to start a discussion. Please also attach a zip file with example data that can later on be used during testing.

If you are familiar with GitHub, please create a pull request and make sure that

  • the file format reader is located in afmformats.formats.fmt_NAME (it may be a directory or a file, depending on the complexity)

  • the file format displays correctly in the docs and the docs compile without errors:

    cd docs
    pip install -r requirements.txt
    sphinx-build . _build
    # and open _build/index.html in a browser
    
  • you updated the CHANGELOG

  • your code is fully tested (create test functions in tests/test_fmt_NAME.py) and all other tests pass (There are a few general tests that all file format readers must pass):

    pip install pytest
    pytest tests
    
  • the data files for examples are named according to fmt-NAME-MOD_filename.suffix where MOD can be e.g. fd for force-distance data.

If you cannot or will not work with GitHub, you may paste your code in the corresponding issue. If the file format is not too complicated, let’s just hope that things don’t get messy.

Basic file format reader structure

The best way to understand how file formats work in afmformats is to take a look at the file formats implemented already. For the sake of clarity, here is a file format reader template:

import pathlib

import numpy as np


__all__ = ["load_my_format"]


def load_my_format(path, callback=None, meta_override=None):
    """Loads AFM data from my format

    This is the main function for loading your file format. Please
    add a description here.

    Parameters
    ----------
    path: str or pathlib.Path or io.TextIOBase
        path to a .tab file
    callback: callable
        function for progress tracking; must accept a float in
        [0, 1] as an argument.
    meta_override: dict
        if specified, contains key-value pairs of metadata that
        are used when loading the files
        (see :data:`afmformats.meta.META_FIELDS`)
    """
    if meta_override is None:
        meta_override = {}

    path = pathlib.Path(path)
    # Here you would start parsing your data and metadata from `path`
    # You should specify as many metadata keys as possible. See
    # afmformats.meta.DEF_ALL for a list of valid keys.
    metadata = {"path": path}
    # Valid column names are defined in afmformats.afm_data.known_columns.
    data = {"force": np.linspace(1e-9, 5e-9, 100),
            "height (measured)": np.linspace(2e-6, -1e-6, 100)}

    metadata.update(meta_override)
    dd = {"data": data,
          "metadata": metadata}

    if callback is not None:
        callback(1)

    # You may also return a list with more items in case the file format
    # contains more than one curve.
    return [dd]


recipe_myf = {
    "descr": "A short description",
    "loader": load_my_format,
    "suffix": ".myf",
    "modality": "force-distance",
    "maker": "designer of file format",
}

A few notes:

  • The recipe_myf contains the recipe for loading the file format into afmformats. It must be registered in afmformats/formats/__init__.py.

  • You may call the callback function with a floating point value between 0 and 1 (progress tracking) in-between of your loading steps if you expect that your file format reader is slow (e.g. several curves have to be loaded). This will give users of e.g. PyJibe visual feedback on how long they will have to wait.

  • The meta_override dictionary is useful if you file format does not contain essential metadata such as spring constant or sensitivity. In such cases, you can raise an afmformats.errors.MissingMetaDataError to signal PyJibe that it should ask the user for the missing metadata. For an example, please see the AFM workshop file format.

Optimizing data import

In most cases, it is not neccessary to actually load the data from disk in the load_my_format method, especially if you have to parse large binary blobs or text files. In such cases, you can make use of the lazy loaders implemented in afmformats. For metadata, you can use afmformats.meta.LazyMetaValue and for data columns, you can use afmformats.lazy_loader.LazyData. The JPK file reader makes heavy usage of those classes.