Skip to main content

Outlook connector reference

This page contains reference documentation for the Outlook connector in Lakeflow Connect.

Connection properties

When you create the Unity Catalog connection, you must specify the following properties. See Configure authentication to Microsoft Outlook for how to obtain these values.

Property

Description

Client ID

The Application (client) ID from the Microsoft Entra ID app registration.

Client secret

The client secret value from the Microsoft Entra ID app registration.

Tenant ID

The Directory (tenant) ID from the Microsoft Entra ID app registration.

Destination schema

The connector produces a single table, email_messages, under the default schema.

  • Primary key: (mailbox, outlook_message_id)
  • Incremental sync cursor: received_at, tracked per mailbox and folder

email_messages

Column

Type

Description

mailbox

string

Email address of the mailbox. Part of the primary key.

outlook_message_id

string

Unique message ID from the Microsoft Graph API. Part of the primary key.

internet_message_id

string

RFC 2822 internet message ID.

conversation_id

string

Conversation thread ID.

folder

string

Folder display name (for example, Inbox).

to_recipients

array<string>

List of recipient email addresses.

cc_recipients

array<string>

List of CC recipient email addresses.

bcc_recipients

array<string>

List of BCC recipient email addresses.

from

string

Sender email address.

sender

string

Actual sender email address (might differ from from when sent on behalf).

reply_to

array<string>

List of reply-to email addresses.

subject

string

Email subject line.

importance

string

Importance level (for example, normal, high, low).

is_read

boolean

Whether the message has been read.

in_reply_to

string

Internet message ID of the parent message, from email headers.

references

array<string>

Array of referenced message IDs, from email headers.

body_preview

string

Preview of the email body.

full_body_content

string

Complete body content. Format is HTML or plain text, based on the body_format option.

unique_body_content

string

Unique body content, excluding quoted text from replies.

received_at

timestamp

Date and time the message was received (ISO-8601). Used as the incremental sync cursor.

sent_at

timestamp

Date and time the message was sent (ISO-8601).

categories

array<string>

User-defined categories or tags on the message.

attachments

array<struct>

Array of attachment structs. Omitted when attachment_mode is NONE. See Attachment struct.

Attachment struct

Field

Type

Description

attachment_id

string

ID of the attachment from the Microsoft Graph API.

file_name

string

Original filename.

mime_type

string

MIME type (for example, application/pdf).

size

bigint

File size in bytes.

attachment_kind

string

Type indicator (for example, fileAttachment, itemAttachment).

is_inline

boolean

Whether the attachment is inline (for example, an embedded image in a signature).

content

binary

Base64-encoded file content.

Connector options

These options are specified under outlook_options in the pipeline specification. See Filter combination logic for how multiple filter options interact.

Option

Type

Required

Default

Description

include_mailboxes

array<string>

No

All accessible mailboxes

List of mailbox email addresses to sync. If not specified, the connector discovers and ingests all accessible mailboxes in the tenant using the Microsoft Graph GET /users endpoint.

include_folders

array<string>

No

["Inbox"]

List of folder display names to sync. Examples: Inbox, Sent Items, Custom_Folder. Matching is case-insensitive.

include_senders

array<string>

No

All senders

Filter emails by sender email address using exact match. Example: user@vendor.com.

include_subjects

array<string>

No

All subjects

Filter emails by subject line. Values ending with * use prefix match; other values use substring match. Example: "Invoice" (substring), "Re:*" (prefix).

start_date

string

No

Complete history from epoch

Start date for the initial sync in YYYY-MM-DD format. Determines the earliest date from which to sync historical data.

body_format

string

No

TEXT_HTML

Controls the email body content format. TEXT_HTML: preserves full HTML formatting. TEXT_PLAIN: converts the body to plain text (recommended for AI/RAG pipelines to reduce token usage).

attachment_mode

string

No

ALL

Controls which attachments to ingest. ALL: all attachments. NON_INLINE_ONLY: non-inline attachments only (recommended to avoid corporate signature images). INLINE_ONLY: inline attachments only. NONE: no attachments (skips attachment API calls entirely).

Filter combination logic

An email message is ingested when it matches at least one value from each specified filter category. Multiple filter categories are combined with AND logic; values within a single category use OR logic.

Example: include_folders=["Inbox"] AND include_senders=["user@vendor.com", "alerts@system.io"] ingests emails from the Inbox folder that are sent by either user@vendor.com OR alerts@system.io.