Skip to content

DiffLine.rawContent() returns string instead of Buffer, causing non-UTF-8 encoding corruption #2038

@liangjingyang

Description

@liangjingyang

Description

Thanks to the nodegit maintainers for this excellent library!

This issue was debugged with the assistance of Cursor and Opus 4.5.

Current Behavior

DiffLine.rawContent() returns a JavaScript string type, but the underlying libgit2 git_diff_line.content is a raw byte pointer (const char *) that is not NUL-terminated and may contain non-UTF-8 encoded content (e.g., GBK, GB18030).

The current implementation in lib/diff_line.js:

var _rawContent = DiffLine.prototype.content;  // Save original native method

DiffLine.prototype.content = function() {
  // ...
  this._cache.content = Buffer.from(this.rawContent())
    .slice(0, this.contentLen())
    .toString("utf8");
  return this._cache.content;
};

DiffLine.prototype.rawContent = function() {
  return _rawContent.call(this);  // Calls native binding
};

The problem is that _rawContent (the native binding) already converts const char * to a JavaScript string, presumably using v8::String::NewFromUtf8() or similar, which assumes UTF-8 encoding.

Expected Behavior

rawContent() should return a Buffer containing the original bytes, allowing users to detect and decode the encoding themselves:

DiffLine.prototype.rawContent = function() {
  // Return Buffer instead of string
  return _rawContent.call(this);  // Should return Buffer
};

DiffLine.prototype.content = function() {
  // ... existing implementation
  return this.rawContent()
    .slice(0, this.contentLen())
    .toString("utf8");
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions