Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
deprecated: loadData to load_data, and move it
  • Loading branch information
sbillinge committed Jan 17, 2026
commit 7cd01f1defe2413504188802bdf2c72a2523361a
24 changes: 12 additions & 12 deletions docs/source/examples/parsers_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,23 @@ Using the parsers module, we can load file data into simple and easy-to-work-wit
Our goal will be to extract the data, and the parameters listed in the header, from this file and
load it into our program.

2) To get the data table, we will use the ``loadData`` function. The default behavior of this
2) To get the data table, we will use the ``load_data`` function. The default behavior of this
function is to find and extract a data table from a file.

.. code-block:: python

from diffpy.utils.parsers.loaddata import loadData
data_table = loadData('<PATH to data.txt>')
from diffpy.utils.tools import load_data
data_table = load_data('<PATH to data.txt>')

While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is
due to the comments ``$ Phase Transition Near This Temperature Range`` and ``--> Note Significant Jump in Rw <--``
embedded within the dataset. To fix this, try using the ``comments`` parameter.

.. code-block:: python

data_table = loadData('<PATH to data.txt>', comments=['$', '-->'])
data_table = load_data('<PATH to data.txt>', comments=['$', '-->'])

This parameter tells ``loadData`` that any lines beginning with ``$`` and ``-->`` are just comments and
This parameter tells ``load_data`` that any lines beginning with ``$`` and ``-->`` are just comments and
more entries in our data table may follow.

Here are a few other parameters to test out:
Expand All @@ -39,30 +39,30 @@ Here are a few other parameters to test out:

.. code-block:: python

loadData('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')
load_data('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')

returns an empty list.
* ``minrows=50``: Only look for data tables with at least 50 rows. Since our data table has much less than that many
rows, running

.. code-block:: python

loadData('<PATH to data.txt>', comments=['$', '-->'], minrows=50)
load_data('<PATH to data.txt>', comments=['$', '-->'], minrows=50)

returns an empty list.
* ``usecols=[0, 3]``: Only return the 0th and 3rd columns (zero-indexed) of the data table. For ``data.txt``, this
corresponds to the temperature and rw columns.

.. code-block:: python

loadData('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])
load_data('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])

3) Next, to get the header information, we can again use ``loadData``,
3) Next, to get the header information, we can again use ``load_data``,
but this time with the ``headers`` parameter enabled.

.. code-block:: python

hdata = loadData('<PATH to data.txt>', comments=['$', '-->'], headers=True)
hdata = load_data('<PATH to data.txt>', comments=['$', '-->'], headers=True)

4) Rather than working with separate ``data_table`` and ``hdata`` objects, it may be easier to combine them into a single
dictionary. We can do so using the ``serialize_data`` function.
Expand Down Expand Up @@ -116,8 +116,8 @@ The returned value, ``parsed_file_data``, is the dictionary we just added to ``s

.. code-block:: python

data_table = loadData('<PATH to moredata.txt>')
hdata = loadData('<PATH to moredata.txt>', headers=True)
data_table = load_data('<PATH to moredata.txt>')
hdata = load_data('<PATH to moredata.txt>', headers=True)
serialize_data('<PATH to moredata.txt>', hdata, data_table, serial_file='<PATH to serialdata.json>')

The serial file ``serialfile.json`` should now contain two entries: ``data.txt`` and ``moredata.txt``.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/examples/resample_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ given enough datapoints.

.. code-block:: python

from diffpy.utils.parsers.loaddata import loadData
nickel_datatable = loadData('<PATH to Nickel.gr>')
nitarget_datatable = loadData('<PATH to NiTarget.gr>')
from diffpy.utils.tools import load_data
nickel_datatable = load_data('<PATH to Nickel.gr>')
nitarget_datatable = load_data('<PATH to NiTarget.gr>')

Each data table has two columns: first is the grid and second is the function value.
To extract the columns, we can utilize the serialize function ...
Expand Down
4 changes: 2 additions & 2 deletions docs/source/utilities/parsers_utility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Parsers Utility

The ``diffpy.utils.parsers`` module allows users to easily and robustly load file data into a Python project.

- ``loaddata.loadData()``: Find and load a data table/block from a text file. This seems to work for most datafiles
- ``loaddata.load_data()``: Find and load a data table/block from a text file. This seems to work for most datafiles
including those generated by diffpy programs. Running only ``numpy.loadtxt`` will result in errors
for most these files as there is often excess data or parameters stored above the data block.
Users can instead choose to load all the parameters of the form ``<param_name> = <param_value>`` into a dictionary
Expand All @@ -17,7 +17,7 @@ The ``diffpy.utils.parsers`` module allows users to easily and robustly load fil
- ``serialization.deserialize_data()``: Load data from a serial file format into a Python dictionary. Currently, the only supported
serial format is ``.json``.

- ``serialization.serialize_data()``: Serialize the data generated by ``loadData()`` into a serial file format. Currently, the only
- ``serialization.serialize_data()``: Serialize the data generated by ``load_data()`` into a serial file format. Currently, the only
supported serial format is ``.json``.

For a more in-depth tutorial for how to use these parser utilities, click :ref:`here <Parsers Example>`.
23 changes: 23 additions & 0 deletions news/depr-tests.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
**Added:**

* <news item>

**Changed:**

* load_data now takes a Path or a string for the file-path

**Deprecated:**

* diffpy.utils.parsers.loaddata.loadData replaced by diffpy.utils.tools.load_data

**Removed:**

* <news item>

**Fixed:**

* <news item>

**Security:**

* <news item>
12 changes: 8 additions & 4 deletions src/diffpy/utils/_deprecator.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def deprecated(message, *, category=DeprecationWarning, stacklevel=1):

.. code-block:: python

from diffpy._deprecations import deprecated
from diffpy.utils._deprecator import deprecated
import warnings

@deprecated("old_function is deprecated; use new_function instead")
Expand All @@ -39,7 +39,6 @@ def new_function(x, y):
.. code-block:: python

from diffpy._deprecations import deprecated
import warnings

warnings.simplefilter("always", DeprecationWarning)

Expand Down Expand Up @@ -83,7 +82,9 @@ def wrapper(*args, **kwargs):
return decorator


def deprecation_message(base, old_name, new_name, removal_version):
def deprecation_message(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want our function names to be verbs not nouns? I think I prefer build_deprecation_message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge Okay we can change this

base, old_name, new_name, removal_version, new_base=None
):
"""Generate a deprecation message.

Parameters
Expand All @@ -102,7 +103,10 @@ def deprecation_message(base, old_name, new_name, removal_version):
str
A formatted deprecation message.
"""
if new_base is None:
new_base = base
return (
f"'{base}.{old_name}' is deprecated and will be removed in "
f"version {removal_version}. Please use '{base}.{new_name}' instead."
f"version {removal_version}. Please use '{new_base}.{new_name}' "
f"instead."
)
15 changes: 14 additions & 1 deletion src/diffpy/utils/parsers/loaddata.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,21 @@
import numpy

from diffpy.utils import validators
from diffpy.utils._deprecator import deprecated, deprecation_message

base = "diffpy.utils.parsers.loaddata"
removal_version = "4.0.0"

loaddata_deprecation_msg = deprecation_message(
base,
"loadData",
"load_data",
removal_version,
new_base="diffpy.utils.tools",
)


@deprecated(loaddata_deprecation_msg)
def loadData(
filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs
):
Expand Down Expand Up @@ -254,7 +267,7 @@ def readfp(self, fp, append=False):

File details include:
* File name.
* All data blocks findable by loadData.
* All data blocks findable by load_data.
* Headers (if present) for each data block. (Generally the headers
contain column name information).
"""
Expand Down
180 changes: 180 additions & 0 deletions src/diffpy/utils/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from scipy.signal import convolve
from xraydb import material_mu

from diffpy.utils import validators
from diffpy.utils.parsers.loaddata import loadData


Expand Down Expand Up @@ -396,3 +397,182 @@ def compute_mud(filepath):
key=lambda pair: pair[1],
)
return best_mud


def load_data(
filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs
):
"""Find and load data from a text file.

The data block is identified as the first matrix block of at least
minrows rows and constant number of columns. This seems to work for most
of the datafiles including those generated by diffpy programs.

Parameters
----------
filename: Path or string
Name of the file we want to load data from.
minrows: int
Minimum number of rows in the first data block. All rows must have
the same number of floating point values.
headers: bool
when False (default), the function returns a numpy array of the data
in the data block. When True, the function instead returns a
dictionary of parameters and their corresponding values parsed from
header (information prior the data block). See hdel and hignore for
options to help with parsing header information.
hdel: str
(Only used when headers enabled.) Delimiter for parsing header
information (default '='). e.g. using default hdel, the line '
parameter = p_value' is put into the dictionary as
{parameter: p_value}.
hignore: list
(Only used when headers enabled.) Ignore header rows beginning with
any elements in hignore. e.g. hignore=['# ', '['] causes the
following lines to be skipped: '# qmax=10', '[defaults]'.
kwargs:
Keyword arguments that are passed to numpy.loadtxt including the
following arguments below. (See numpy.loadtxt for more details.) Only
pass kwargs used by numpy.loadtxt.

Useful kwargs
=============
comments: str, sequence of str
The characters or list of characters used to indicate the start of a
comment (default '#'). Comment lines are ignored.
delimiter: str
Delimiter for the data in the block (default use whitespace). For
comma-separated data blocks, set delimiter to ','.
unpack: bool
Return data as a sequence of columns that allows tuple unpacking such
as x, y = load_data(FILENAME, unpack=True). Note transposing the
loaded array as load_data(FILENAME).T has the same effect.
usecols:
Zero-based index of columns to be loaded, by default use all detected
columns. The reading skips data blocks that do not have the usecols-
specified columns.

Returns
-------
data_block: ndarray
A numpy array containing the found data block. (This is not returned
if headers is enabled.)
hdata: dict
If headers are enabled, return a dictionary of parameters read from
the header.
"""
from numpy import array, loadtxt

# for storing header data
hdata = {}
# determine the arguments
delimiter = kwargs.get("delimiter")
usecols = kwargs.get("usecols")
# required at least one column of floating point values
mincv = (1, 1)
# but if usecols is specified, require sufficient number of columns
# where the used columns contain floats
if usecols is not None:
hiidx = max(-min(usecols), max(usecols) + 1)
mincv = (hiidx, len(set(usecols)))

# Check if a line consists of floats only and return their count
# Return zero if some strings cannot be converted.
def countcolumnsvalues(line):
try:
words = line.split(delimiter)
# remove trailing blank columns
while words and not words[-1].strip():
words.pop(-1)
nc = len(words)
if usecols is not None:
nv = len([float(words[i]) for i in usecols])
else:
nv = len([float(w) for w in words])
except (IndexError, ValueError):
nc = nv = 0
return nc, nv

# Check if file exists before trying to open
filename = Path(filename)
if not filename.is_file():
raise IOError(
(
f"File {str(filename)} cannot be found. "
"Please rerun the program specifying a valid filename."
)
)

# make sure fid gets cleaned up
with open(filename, "rb") as fid:
# search for the start of datablock
start = ncvblock = None
fpos = (0, 0)
nrows = 0
for line in fid:
# decode line
dline = line.decode()
# find header information if requested
if headers:
hpair = dline.split(hdel)
flag = True
# ensure number of non-blank arguments is two
if len(hpair) != 2:
flag = False
else:
# ignore if an argument is blank
hpair[0] = hpair[0].strip() # name of data entry
hpair[1] = hpair[1].strip() # value of entry
if not hpair[0] or not hpair[1]:
flag = False
else:
# check if row has an ignore tag
if hignore is not None:
for tag in hignore:
taglen = len(tag)
if (
len(hpair[0]) >= taglen
and hpair[0][:taglen] == tag
):
flag = False
# add header data
if flag:
name = hpair[0]
value = hpair[1]
# check if data value should be stored as float
if validators.is_number(hpair[1]):
value = float(hpair[1])
hdata.update({name: value})
# continue search for the start of datablock
fpos = (fpos[1], fpos[1] + len(line))
line = dline
ncv = countcolumnsvalues(line)
if ncv < mincv:
start = None
continue
# ncv is acceptable here, require the same number of columns
# throughout the datablock
if start is None or ncv != ncvblock:
ncvblock = ncv
nrows = 0
start = fpos[0]
nrows += 1
# block was found here!
if nrows >= minrows:
break

# Return header data if requested
if headers:
return hdata # Return, so do not proceed to reading datablock

# Return an empty array when no data found.
# loadtxt would otherwise raise an exception on loading from EOF.
if start is None:
data_block = array([], dtype=float)
else:
fid.seek(start)
# always use usecols argument so that loadtxt does not crash
# in case of trailing delimiters.
kwargs.setdefault("usecols", list(range(ncvblock[0])))
data_block = loadtxt(fid, **kwargs)
return data_block
Loading