Attributes within <html> tag are stripped in extension, but not in cloud version

Hello,

I’m seeing divergent behavior between the Chrome version and distill.io version of the monitor.

Specifically, in the Chrome version, attributes of the <html> tag itself appear stripped, while in the cloud version, they are retained (the desired behavior for me).

Using the config json below when monitoring https://chatgpt.com/ works correctly in the cloud monitor and captures the “data-build” <html> attribute, but shows up as “undefined” in the local extension version.

Moreover, when l view the diff in source view in the extension, I can see that it has performed some source sanitization, like stripping the <html> attributes (which I happen to need), and stripping the <meta> tag entirely (which doesn’t concern my use case). For example:

<html><head><link rel="modulepreload" href="https://cdn.oaistatic.com/assets/manifest-316c9336.js">

Whereas when viewing the same diff in the cloud monitor version, no such sanitization is performed, and the “data-build” attribute within the <html> tag which I need to monitor for changes is still present:

<html lang="en-US" data-build="prod-eefe4fe3afec2efde62cdc0669965885fde88495" dir="ltr" class=""><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="modulepreload" href="https://cdn.oaistatic.com/assets/manifest-316c9336.js">

And lastly, the config json I’m using:

{
  "selections": [
    {
      "frames": [
        {
          "index": 0,
          "excludes": [],
          "includes": [
            {
              "expr": "/html",
              "type": "xpath",
              "fields": [
                {
                  "type": "attribute",
                  "name": "data-build"
                }
              ]
            }
          ]
        }
      ],
      "dynamic": false,
      "delay": 0
    }
  ],
  "regexp": {
    "expr": "",
    "flags": "gim"
  },
  "ignoreEmptyText": true,
  "includeStyle": false,
  "viewport": {},
  "dataAttr": "text",
  "params": {},
  "blockAdsAndCookies": false
}

Any thoughts as to whether I can do anything locally to disable the sanitization behavior?

Thanks in advance!

Tom

1 Like

thanks for flagging this @hey_tommy. the extension’s module to extract html out is slightly different from cloud monitor’s. we will get this normalized and let you know. cheers!

1 Like

Thanks - I appreciate the reply.
Cheers!