Skip to content

OBPIH-7884 Test downloaded documents' content#90

Open
alannadolny wants to merge 11 commits into
mainfrom
OBPIH-7884
Open

OBPIH-7884 Test downloaded documents' content#90
alannadolny wants to merge 11 commits into
mainfrom
OBPIH-7884

Conversation

@alannadolny
Copy link
Copy Markdown
Collaborator

No description provided.

@alannadolny alannadolny self-assigned this May 29, 2026
Comment on lines +75 to 78
const recipientName = rowValues.recipient?.name;
if (!_.isNil(recipientName)) {
await this.recipientSelect.findAndSelectOption(recipientName);
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typescript error introduced a long time ago, so here's the fix

Comment thread package.json
"eslint-plugin-playwright": "~1.0.1",
"eslint-plugin-promise": "~6.0.0",
"eslint-plugin-simple-import-sort": "~10.0.0",
"pdfjs-dist": "~3.11.174",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this https://github.com/mozilla/pdfjs-dist ? It looks like it is no longer supported as of 2024

Comment thread src/utils/pdfUtils.ts
return content.items
.map((item) => ('str' in item ? (item as TextItem).str : ''))
.join(' ');
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding, this is pulling everything out of the PDF as a String, then we do a String search in pdfContainsValues to see if the PDF contains some text?

Copy link
Copy Markdown
Member

@ewaterman ewaterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests themselves look good from what I can tell. The only thing is that the new dependency is deprecated. I don't know if there's a similar alternative with proper support.

A brief google search tells me mozilla pdf.js is a standard solution for reading PDF contents.

The example code I found:

async function extractTextFromPdf(urlOrBuffer) {
  const loadingTask = pdfjsLib.getDocument(urlOrBuffer);
  const pdf = await loadingTask.promise;
  let fullText = "";

  // Loop through every page to extract text snippets
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const textContent = await page.getTextContent();
    
    // Concatenate individual text items into a single page string
    const pageText = textContent.items.map(item => item.str).join(" ");
    fullText += pageText + "\n";
  }
  
  return fullText;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants