deprecated: loadData to load_data, and move it

diffpy · sbillinge · Feb 4, 2026 · Jan 17, 2026 · Jan 17, 2026 · Jan 18, 2026
commit 7cd01f1defe2413504188802bdf2c72a2523361a
diff --git a/docs/source/examples/parsers_example.rst b/docs/source/examples/parsers_example.rst
@@ -13,23 +13,23 @@ Using the parsers module, we can load file data into simple and easy-to-work-wit
    Our goal will be to extract the data, and the parameters listed in the header, from this file and
    load it into our program.
 
-2) To get the data table, we will use the ``loadData`` function. The default behavior of this
+2) To get the data table, we will use the ``load_data`` function. The default behavior of this
    function is to find and extract a data table from a file.
 
 .. code-block:: python
 
-     from diffpy.utils.parsers.loaddata import loadData
-     data_table = loadData('<PATH to data.txt>')
+     from diffpy.utils.tools import load_data
+     data_table = load_data('<PATH to data.txt>')
 
 While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is
 due to the comments ``$ Phase Transition Near This Temperature Range`` and ``--> Note Significant Jump in Rw <--``
 embedded within the dataset. To fix this, try using the ``comments`` parameter.
 
 .. code-block:: python
 
-     data_table = loadData('<PATH to data.txt>', comments=['$', '-->'])
+     data_table = load_data('<PATH to data.txt>', comments=['$', '-->'])
 
-This parameter tells ``loadData`` that any lines beginning with ``$`` and ``-->`` are just comments and
+This parameter tells ``load_data`` that any lines beginning with ``$`` and ``-->`` are just comments and
 more entries in our data table may follow.
 
 Here are a few other parameters to test out:
@@ -39,30 +39,30 @@ Here are a few other parameters to test out:
 
 .. code-block:: python
 
-       loadData('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')
+       load_data('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')
 
 returns an empty list.
    * ``minrows=50``: Only look for data tables with at least 50 rows. Since our data table has much less than that many
      rows, running
 
 .. code-block:: python
 
-       loadData('<PATH to data.txt>', comments=['$', '-->'], minrows=50)
+       load_data('<PATH to data.txt>', comments=['$', '-->'], minrows=50)
 
 returns an empty list.
    * ``usecols=[0, 3]``: Only return the 0th and 3rd columns (zero-indexed) of the data table. For ``data.txt``, this
      corresponds to the temperature and rw columns.
 
 .. code-block:: python
 
-       loadData('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])
+       load_data('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])
 
-3) Next, to get the header information, we can again use ``loadData``,
+3) Next, to get the header information, we can again use ``load_data``,
    but this time with the ``headers`` parameter enabled.
 
 .. code-block:: python
 
-     hdata = loadData('<PATH to data.txt>', comments=['$', '-->'], headers=True)
+     hdata = load_data('<PATH to data.txt>', comments=['$', '-->'], headers=True)
 
 4) Rather than working with separate ``data_table`` and ``hdata`` objects, it may be easier to combine them into a single
    dictionary. We can do so using the ``serialize_data`` function.
@@ -116,8 +116,8 @@ The returned value, ``parsed_file_data``, is the dictionary we just added to ``s
 
 .. code-block:: python
 
-     data_table = loadData('<PATH to moredata.txt>')
-     hdata = loadData('<PATH to moredata.txt>', headers=True)
+     data_table = load_data('<PATH to moredata.txt>')
+     hdata = load_data('<PATH to moredata.txt>', headers=True)
      serialize_data('<PATH to moredata.txt>', hdata, data_table, serial_file='<PATH to serialdata.json>')
 
 The serial file ``serialfile.json`` should now contain two entries: ``data.txt`` and ``moredata.txt``.

diff --git a/docs/source/examples/resample_example.rst b/docs/source/examples/resample_example.rst
@@ -16,9 +16,9 @@ given enough datapoints.
 
 .. code-block:: python
 
-       from diffpy.utils.parsers.loaddata import loadData
-       nickel_datatable = loadData('<PATH to Nickel.gr>')
-       nitarget_datatable = loadData('<PATH to NiTarget.gr>')
+       from diffpy.utils.tools import load_data
+       nickel_datatable = load_data('<PATH to Nickel.gr>')
+       nitarget_datatable = load_data('<PATH to NiTarget.gr>')
 
 Each data table has two columns: first is the grid and second is the function value.
 To extract the columns, we can utilize the serialize function ...

diff --git a/docs/source/utilities/parsers_utility.rst b/docs/source/utilities/parsers_utility.rst
@@ -5,7 +5,7 @@ Parsers Utility
 
 The ``diffpy.utils.parsers`` module allows users to easily and robustly load file data into a Python project.
 
-- ``loaddata.loadData()``: Find and load a data table/block from a text file. This seems to work for most datafiles
+- ``loaddata.load_data()``: Find and load a data table/block from a text file. This seems to work for most datafiles
   including those generated by diffpy programs. Running only ``numpy.loadtxt`` will result in errors
   for most these files as there is often excess data or parameters stored above the data block.
   Users can instead choose to load all the parameters of the form ``<param_name> = <param_value>`` into a dictionary
@@ -17,7 +17,7 @@ The ``diffpy.utils.parsers`` module allows users to easily and robustly load fil
 - ``serialization.deserialize_data()``: Load data from a serial file format into a Python dictionary. Currently, the only supported
   serial format is ``.json``.
 
-- ``serialization.serialize_data()``: Serialize the data generated by ``loadData()`` into a serial file format. Currently, the only
+- ``serialization.serialize_data()``: Serialize the data generated by ``load_data()`` into a serial file format. Currently, the only
   supported serial format is ``.json``.
 
 For a more in-depth tutorial for how to use these parser utilities, click :ref:`here <Parsers Example>`.
diff --git a/news/depr-tests.rst b/news/depr-tests.rst
@@ -0,0 +1,23 @@
+**Added:**
+
+* <news item>
+
+**Changed:**
+
+* load_data now takes a Path or a string for the file-path
+
+**Deprecated:**
+
+* diffpy.utils.parsers.loaddata.loadData replaced by diffpy.utils.tools.load_data
+
+**Removed:**
+
+* <news item>
+
+**Fixed:**
+
+* <news item>
+
+**Security:**
+
+* <news item>
diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py
@@ -20,7 +20,7 @@ def deprecated(message, *, category=DeprecationWarning, stacklevel=1):
 
     .. code-block:: python
 
-        from diffpy._deprecations import deprecated
+        from diffpy.utils._deprecator import deprecated
         import warnings
 
         @deprecated("old_function is deprecated; use new_function instead")
@@ -39,7 +39,6 @@ def new_function(x, y):
     .. code-block:: python
 
         from diffpy._deprecations import deprecated
-        import warnings
 
         warnings.simplefilter("always", DeprecationWarning)
 
@@ -83,7 +82,9 @@ def wrapper(*args, **kwargs):
     return decorator
 
 
-def deprecation_message(base, old_name, new_name, removal_version):
+def deprecation_message(
+    base, old_name, new_name, removal_version, new_base=None
+):
     """Generate a deprecation message.
 
     Parameters
@@ -102,7 +103,10 @@ def deprecation_message(base, old_name, new_name, removal_version):
     str
         A formatted deprecation message.
     """
+    if new_base is None:
+        new_base = base
     return (
         f"'{base}.{old_name}' is deprecated and will be removed in "
-        f"version {removal_version}. Please use '{base}.{new_name}' instead."
+        f"version {removal_version}. Please use '{new_base}.{new_name}' "
+        f"instead."
     )
diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py
@@ -18,8 +18,21 @@
 import numpy
 
 from diffpy.utils import validators
+from diffpy.utils._deprecator import deprecated, deprecation_message
 
+base = "diffpy.utils.parsers.loaddata"
+removal_version = "4.0.0"
 
+loaddata_deprecation_msg = deprecation_message(
+    base,
+    "loadData",
+    "load_data",
+    removal_version,
+    new_base="diffpy.utils.tools",
+)
+
+
+@deprecated(loaddata_deprecation_msg)
 def loadData(
     filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs
 ):
@@ -254,7 +267,7 @@ def readfp(self, fp, append=False):
 
         File details include:
          *  File name.
-         *  All data blocks findable by loadData.
+         *  All data blocks findable by load_data.
          *  Headers (if present) for each data block. (Generally the headers
             contain column name information).
         """

diff --git a/src/diffpy/utils/tools.py b/src/diffpy/utils/tools.py
@@ -8,6 +8,7 @@
 from scipy.signal import convolve
 from xraydb import material_mu
 
+from diffpy.utils import validators
 from diffpy.utils.parsers.loaddata import loadData
 
 
@@ -396,3 +397,182 @@ def compute_mud(filepath):
         key=lambda pair: pair[1],
     )
     return best_mud
+
+
+def load_data(
+    filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs
+):
+    """Find and load data from a text file.
+
+    The data block is identified as the first matrix block of at least
+    minrows rows and constant number of columns. This seems to work for most
+    of the datafiles including those generated by diffpy programs.
+
+    Parameters
+    ----------
+    filename: Path or string
+        Name of the file we want to load data from.
+    minrows: int
+        Minimum number of rows in the first data block. All rows must have
+        the same number of floating point values.
+    headers: bool
+        when False (default), the function returns a numpy array of the data
+        in the data block. When True, the function instead returns a
+        dictionary of parameters and their corresponding values parsed from
+        header (information prior the data block). See hdel and hignore for
+        options to help with parsing header information.
+    hdel: str
+        (Only used when headers enabled.) Delimiter for parsing header
+        information (default '='). e.g. using default hdel, the line '
+        parameter = p_value' is put into the dictionary as
+        {parameter: p_value}.
+    hignore: list
+        (Only used when headers enabled.) Ignore header rows beginning with
+        any elements in hignore. e.g. hignore=['# ', '['] causes the
+        following lines to be skipped: '# qmax=10', '[defaults]'.
+    kwargs:
+        Keyword arguments that are passed to numpy.loadtxt including the
+        following arguments below. (See numpy.loadtxt for more details.) Only
+        pass kwargs used by numpy.loadtxt.
+
+    Useful kwargs
+    =============
+    comments: str, sequence of str
+        The characters or list of characters used to indicate the start of a
+        comment (default '#'). Comment lines are ignored.
+    delimiter: str
+        Delimiter for the data in the block (default use whitespace). For
+        comma-separated data blocks, set delimiter to ','.
+    unpack: bool
+        Return data as a sequence of columns that allows tuple unpacking such
+        as x, y = load_data(FILENAME, unpack=True). Note transposing the
+        loaded array as load_data(FILENAME).T has the same effect.
+    usecols:
+        Zero-based index of columns to be loaded, by default use all detected
+        columns. The reading skips data blocks that do not have the usecols-
+        specified columns.
+
+    Returns
+    -------
+    data_block: ndarray
+        A numpy array containing the found data block. (This is not returned
+        if headers is enabled.)
+    hdata: dict
+        If headers are enabled, return a dictionary of parameters read from
+        the header.
+    """
+    from numpy import array, loadtxt
+
+    # for storing header data
+    hdata = {}
+    # determine the arguments
+    delimiter = kwargs.get("delimiter")
+    usecols = kwargs.get("usecols")
+    # required at least one column of floating point values
+    mincv = (1, 1)
+    # but if usecols is specified, require sufficient number of columns
+    # where the used columns contain floats
+    if usecols is not None:
+        hiidx = max(-min(usecols), max(usecols) + 1)
+        mincv = (hiidx, len(set(usecols)))
+
+    # Check if a line consists of floats only and return their count
+    # Return zero if some strings cannot be converted.
+    def countcolumnsvalues(line):
+        try:
+            words = line.split(delimiter)
+            # remove trailing blank columns
+            while words and not words[-1].strip():
+                words.pop(-1)
+            nc = len(words)
+            if usecols is not None:
+                nv = len([float(words[i]) for i in usecols])
+            else:
+                nv = len([float(w) for w in words])
+        except (IndexError, ValueError):
+            nc = nv = 0
+        return nc, nv
+
+    # Check if file exists before trying to open
+    filename = Path(filename)
+    if not filename.is_file():
+        raise IOError(
+            (
+                f"File {str(filename)} cannot be found. "
+                "Please rerun the program specifying a valid filename."
+            )
+        )
+
+    # make sure fid gets cleaned up
+    with open(filename, "rb") as fid:
+        # search for the start of datablock
+        start = ncvblock = None
+        fpos = (0, 0)
+        nrows = 0
+        for line in fid:
+            # decode line
+            dline = line.decode()
+            # find header information if requested
+            if headers:
+                hpair = dline.split(hdel)
+                flag = True
+                # ensure number of non-blank arguments is two
+                if len(hpair) != 2:
+                    flag = False
+                else:
+                    # ignore if an argument is blank
+                    hpair[0] = hpair[0].strip()  # name of data entry
+                    hpair[1] = hpair[1].strip()  # value of entry
+                    if not hpair[0] or not hpair[1]:
+                        flag = False
+                    else:
+                        # check if row has an ignore tag
+                        if hignore is not None:
+                            for tag in hignore:
+                                taglen = len(tag)
+                                if (
+                                    len(hpair[0]) >= taglen
+                                    and hpair[0][:taglen] == tag
+                                ):
+                                    flag = False
+                # add header data
+                if flag:
+                    name = hpair[0]
+                    value = hpair[1]
+                    # check if data value should be stored as float
+                    if validators.is_number(hpair[1]):
+                        value = float(hpair[1])
+                    hdata.update({name: value})
+            # continue search for the start of datablock
+            fpos = (fpos[1], fpos[1] + len(line))
+            line = dline
+            ncv = countcolumnsvalues(line)
+            if ncv < mincv:
+                start = None
+                continue
+            # ncv is acceptable here, require the same number of columns
+            # throughout the datablock
+            if start is None or ncv != ncvblock:
+                ncvblock = ncv
+                nrows = 0
+                start = fpos[0]
+            nrows += 1
+            # block was found here!
+            if nrows >= minrows:
+                break
+
+        # Return header data if requested
+        if headers:
+            return hdata  # Return, so do not proceed to reading datablock
+
+        # Return an empty array when no data found.
+        # loadtxt would otherwise raise an exception on loading from EOF.
+        if start is None:
+            data_block = array([], dtype=float)
+        else:
+            fid.seek(start)
+            # always use usecols argument so that loadtxt does not crash
+            # in case of trailing delimiters.
+            kwargs.setdefault("usecols", list(range(ncvblock[0])))
+            data_block = loadtxt(fid, **kwargs)
+    return data_block