Skip to content

Possible bug in reading zero width no-break space character #2369

@AviParikh

Description

@AviParikh

Environment

  • Pythonnet version: 3.0.3
  • Python version: 3.10
  • Operating System: Windows 11
  • .NET Runtime: 8.0.4

Details

  • Describe what you were trying to get done.

I want to read a file that may contain zero width no-break space characters anywhere in the file. If I read a file using C# into Python using Python.NET, these characters get dropped if they are at the start of a line.

from pythonnet import load
load("coreclr")
import clr
clr.AddReference('System')
clr.AddReference('System.Runtime')
clr.AddReference('System.IO')

import System
file_path = r"path\to\zwnbsp_test_data.txt"

result = System.IO.File.ReadAllLines(file_path)
print(str(result[1][0]))

Native C# code does not seem to have this problem:

C#
String[] lines = File.ReadAllLines(@"path\to\zwnbsp_test_data.txt");

int firstChar = char.ConvertToUtf32(lines[1], 0);
int secondChar = char.ConvertToUtf32(lines[1], 1);
Console.WriteLine(firstChar);
Console.WriteLine(secondChar);

zwnbsp_test_data.txt

Some screenshots to help illustrate:
image
image

  • If there was a crash, please include the traceback here.
    No crash

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions