← Back to Home

How to Extract Article Content from Yahoo News

Updated January 14, 2026
content extractionarticle bodymultiple selectorsfallback

Extracting Article Content from Yahoo News

Yahoo News article content is extracted using multiple selectors with fallback logic.

Content Selectors

Try selectors in order of preference:

const contentSelectors = [
  '.article_body',           // Primary selector
  '.sc-fLlhyt',              // Alternative class
  'article .highLightSearchTarget',  // Highlighted content
  '[class*="article"] [class*="body"]',  // Dynamic class
  'article p',               // Fallback to paragraphs
];

Extraction Logic

const articleData = await page.evaluate(() => {
  let content = '';

  // Try each selector
  for (const selector of contentSelectors) {
    const element = document.querySelector(selector);
    if (element && element.textContent && element.textContent.length > 100) {
      content = element.textContent.trim();
      break;
    }
  }

  // Fallback: concatenate all paragraphs
  if (!content) {
    const paragraphs = Array.from(document.querySelectorAll('article p, .article p'));
    content = paragraphs.map(p => p.textContent?.trim()).filter(Boolean).join('\n\n');
  }

  return { content };
});

Extract Images

const images: string[] = [];
const imgElements = document.querySelectorAll('article img, .article img');

imgElements.forEach(img => {
  const src = img.src;
  if (src && !src.includes('logo') && !src.includes('icon')) {
    images.push(src);
  }
});

Extract Metadata

// Category
const categoryElement = document.querySelector('.category, [class*="category"]');
const category = categoryElement?.textContent?.trim() || '';

// Publish date
const dateElement = document.querySelector('time, .date, [class*="date"]');
const publishedDate = dateElement?.getAttribute('datetime') ||
                      dateElement?.textContent?.trim() || '';

Validation

if (!articleData.content) {
  throw new Error('Could not extract article content');
}

Always validate that content was extracted successfully.