Fetching Top Picks from Yahoo News
To fetch top articles from Yahoo Japan News, use Puppeteer with anti-detection configuration.
Target URL
https://news.yahoo.co.jp/topics/top-picks
Key Steps
Initialize Puppeteer with anti-detection args:
--no-sandbox--disable-setuid-sandbox--disable-dev-shm-usage
Set viewport and user agent to appear as real browser:
await page.setViewport({ width: 1920, height: 1080 }); await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ...');Navigate with network idle:
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });Wait for dynamic content (2 seconds).
Extract articles using selector
.newsFeed_list li a[href*="/pickup/"].
Code Example
const articles = await page.evaluate((maxArticles) => {
const links = document.querySelectorAll('.newsFeed_list li a[href*="/pickup/"]');
const results = [];
links.forEach((link) => {
if (results.length >= maxArticles) return;
const url = link.href;
const match = url.match(/\/pickup\/(\d+)/);
if (match) {
results.push({ id: match[1], url, title: link.textContent.trim() });
}
});
return results;
}, 25);
Notes
- Maximum 25 articles recommended
- Deduplicate URLs using Set
- Returns basic metadata (id, url, title)
- Requires cleanup of timestamps from titles