PyPHM - Machinery data, made easy
Why build an open-source package for machinery data?
Why?
If you’re working on some sort of computer vision problem, you can readily access common datasets in PyTorch from something like torchvision datasets. Nice API. Quickly download and get things up and running.
I work with industrial machine data, in a field called Prognostics and Health Management (PHM). I strive to understand how and when machines fail using the tools of machine learning and data science. For this, I need machinery data. Maybe, you would think, there is a package, like those found in PyTorch or Tensorflow, to quickly download this industrial data? Maybe, just maybe?
Nope. There is no such thing.
So I decided to make one, and it’s called PyPHM. Everything is still in development (very alpha), but you can find the github repo here.
What is PyPHM?
PyPHM is a package, written in Python, that lets users easily download and preprocess machinery data. The PyPHM package will quickly get the data prepared, up to the point where it can be used to implement machine learning or feature engineering.
For example, you can download the UC-Berkeley Milling dataset, and get the x
and y
numpy arrays ready for machine learning, with only a few lines of code.
from pyphm.datasets.milling import MillingPrepMethodA
import numpy as np
from pathlib import Path
# define the location of where the raw data folders will be kept.
# e.g. the milling data will be in path_data_raw_folder/milling/
path_data_raw_folder = Path(Path.cwd().parent / 'data/raw/' )
# instantiate the MillingPrepMethodA class and download data if it does not exist
mill = MillingPrepMethodA(root=path_data_raw_folder, download=True, window_len=64, stride=64)
# create the x and y numpy arrays
x, y = mill.create_xy_arrays()
print("x.shape", x.shape)
print("y.shape", y.shape)
x.shape (11570, 64, 6)
y.shape (11570, 64, 3)
Goals
With PyPHM, I’m striving for the following:
- A package with a coherent and thoughtful API.
- Thorough documentation, with plenty of examples.
- A package that is well tested.
- A package built with continuous integration and continuous deployment.
- A package that implements common data preprocessing methods already used by researchers.
Progress
PyPHM can now be accessed through the python Package Index (PyPI). Get it using pip install pyphm
.
Three datasets are available, with plans of many more to come.
I have started drafting the documentation and working on examples. Good documentation is something I see as very much lacking in my field. I hope PyPHM can help remedy that.
Future Plans
There is still much work to do! In the next while I’ll be adding more datasets, and improving the documentation. I’ll be learning how to use readthedocs to generate the documentation.
Stay tuned!