This post walks you through a single-file PHP mini auditor that fetches any webpage and extracts key SEO signals—
title, meta description, keywords, canonical, H1, Open Graph, Twitter/X, GA4 IDs,
and internal/external link counts—using only cURL
and DOMDocument + XPath
.
It works in both browser and CLI modes.
DOMDocument
and DOMXPath
.G-XXXXXXXX
).?url=...
in browser
or $argv[1]
in CLI. We sanitize the value and ensure it starts with HTTP/HTTPS.
DOMDocument
loads the HTML; DOMXPath
lets us query like bs4’s find()
.
//title
, //meta[@name='description']
(fallback to og:description
)//meta[@name='keywords']
, //meta[@name='viewport']
, //link[@rel='canonical']
//h1
nodes@property='og:*'
, Twitter/X via @name='twitter:*'
or 'x:*'
/G-[A-Z0-9]{6,12}/
.
<a href>
, compare host against the target URL’s host to count internal vs external.
if (PHP_SAPI === 'cli')
prints a compact text summary; otherwise we render a HTML report.
CLI:
php mini_auditor.php "https://www.plus2net.com/python/set.php"
Browser:
mini_auditor.php?url=https://www.plus2net.com/python/set.php
For a public demo page, remove the ?url=
option and hardcode one URL.
This prevents other sites from embedding your tool to fetch arbitrary pages.
<?php
$DEFAULT_URL = "https://www.plus2net.com/python/set.php"; // fixed
$targetUrl = $DEFAULT_URL; // ignore $_GET['url'] and CLI args
// ... (reuse the same cURL + DOM parsing code, just skip reading ?url)
?>
Save as mini_auditor.php
. Ensure cURL is enabled in PHP.
<?php
/**
* mini_auditor.php
* Minimal PHP webpage auditor (DOMDocument + XPath, no external libs).
* - Browser: mini_auditor.php?url=https://www.plus2net.com/python/set.php
* - CLI: php mini_auditor.php "https://www.plus2net.com/python/set.php"
*/
// ===== 1) Config =====
$DEFAULT_URL = "https://www.plus2net.com/python/set.php";
$UA = "Mozilla/5.0 (compatible; Plus2net-PHP-Auditor/1.0; +https://www.plus2net.com/)";
$TIMEOUT = 20;
// ===== 2) Resolve target URL =====
$targetUrl = $DEFAULT_URL;
if (PHP_SAPI === 'cli') {
if (!empty($argv[1])) $targetUrl = $argv[1];
} else {
if (!empty($_GET['url'])) $targetUrl = $_GET['url'];
}
$targetUrl = trim($targetUrl);
if (!preg_match('#^https?://#i', $targetUrl)) {
$targetUrl = "https://" . ltrim($targetUrl, "/");
}
// ===== 3) Fetch HTML via cURL =====
function fetch_html($url, $ua, $timeout) {
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_USERAGENT => $ua,
CURLOPT_CONNECTTIMEOUT => $timeout,
CURLOPT_TIMEOUT => $timeout,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_HEADER => false,
]);
$html = curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
$err = curl_error($ch);
curl_close($ch);
return [$status, $html, $err];
}
list($status, $html, $err) = fetch_html($targetUrl, $UA, $TIMEOUT);
// ===== 4) Parse with DOMDocument + XPath =====
libxml_use_internal_errors(true);
$dom = new DOMDocument();
if ($html !== false && $html !== null) {
@$dom->loadHTML($html);
}
$xpath = new DOMXPath($dom);
function firstAttr(DOMNodeList $nodes, $attr) {
if ($nodes->length > 0) {
$n = $nodes->item(0);
return trim($n->getAttribute($attr) ?? "");
}
return null;
}
function firstNodeText(DOMNodeList $nodes) {
if ($nodes->length > 0) {
return trim($nodes->item(0)->textContent ?? "");
}
return null;
}
function q($xpath, $expr) { return $xpath->query($expr); }
$parts = parse_url($targetUrl);
$baseHost = isset($parts['host']) ? strtolower($parts['host']) : "";
// Extract fields
$title = firstNodeText(q($xpath, "//title"));
$metaDescription = firstAttr(q($xpath, "//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='description']"), "content");
if (!$metaDescription) {
$metaDescription = firstAttr(q($xpath, "//meta[translate(@property,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='og:description']"), "content");
}
$metaKeywords = firstAttr(q($xpath, "//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='keywords']"), "content");
$metaViewport = firstAttr(q($xpath, "//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='viewport']"), "content");
$canonical = firstAttr(q($xpath, "//link[translate(@rel,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='canonical']"), "href");
// H1s
$h1nodes = q($xpath, "//h1");
$h1s = [];
foreach ($h1nodes as $h) { $h1s[] = trim($h->textContent); }
// OG basics
$ogTitle = firstAttr(q($xpath, "//meta[translate(@property,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='og:title']"), "content");
$ogDescription = firstAttr(q($xpath, "//meta[translate(@property,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='og:description']"), "content");
$ogImage = firstAttr(q($xpath, "//meta[translate(@property,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='og:image']"), "content");
$ogPresent = ($ogTitle || $ogDescription || $ogImage) ? 1 : 0;
// Twitter/X presence
$twMeta = q($xpath, "//meta[starts-with(translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'twitter:') or starts-with(translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'x:')]");
$twitterPresent = ($twMeta->length > 0) ? 1 : 0;
// GA4 IDs (regex on raw HTML)
$ga4Ids = [];
if (!empty($html)) {
if (preg_match_all('/G-[A-Z0-9]{6,12}/', $html, $m)) {
$ga4Ids = array_values(array_unique($m[0]));
}
}
// Links (internal/external)
$internal = 0; $external = 0;
$anodes = q($xpath, "//a[@href]");
foreach ($anodes as $a) {
$href = trim($a->getAttribute("href"));
if ($href === "" || strpos($href, "javascript:") === 0 || strpos($href, "#") === 0) continue;
$ph = parse_url($href, PHP_URL_HOST);
if (!$ph || strtolower($ph) === $baseHost) $internal++; else $external++;
}
// ===== 5) Output (HTML if browser, text if CLI) =====
$isCli = (PHP_SAPI === 'cli');
if ($isCli) {
echo "=== PHP Mini Auditor ===\n";
echo "URL: $targetUrl\n";
echo "HTTP Status: $status\n";
echo "Title: " . ($title ?: "—") . "\n";
echo "Description: " . ($metaDescription ?: "—") . "\n";
echo "Keywords: " . ($metaKeywords ?: "—") . "\n";
echo "Viewport: " . ($metaViewport ?: "—") . "\n";
echo "Canonical: " . ($canonical ?: "—") . "\n";
echo "H1s: " . (count($h1s) ? implode(" | ", $h1s) : "—") . "\n";
echo "OG present: " . ($ogPresent ? "Yes" : "No") . "\n";
echo "Twitter present:" . ($twitterPresent ? "Yes" : "No") . "\n";
echo "GA4 IDs: " . (count($ga4Ids) ? implode(", ", $ga4Ids) : "—") . "\n";
echo "Links: internal=$internal, external=$external\n";
exit;
}
?>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>PHP Mini Auditor</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body{font-family:system-ui,-apple-system,Segoe UI,Roboto,Ubuntu,"Helvetica Neue",Arial,sans-serif;margin:20px;line-height:1.6}
.wrap{max-width:980px;margin:auto}
.card{border:1px solid #e1e5ea;border-radius:8px;padding:16px;margin-bottom:16px}
.muted{color:#6c757d}
table{width:100%;border-collapse:collapse}
th,td{border:1px solid #e1e5ea;padding:8px;vertical-align:top}
th{background:#f8f9fa;text-align:left}
.ok{color:#1b7f2a}
.warn{color:#9a6700}
.bad{color:#b42318}
input[type=text]{width:100%;padding:8px;border:1px solid #ced4da;border-radius:6px}
button{padding:8px 12px;border-radius:6px;border:1px solid #0d6efd;background:#0d6efd;color:#fff;cursor:pointer}
.row{display:flex;gap:16px;flex-wrap:wrap}
.col{flex:1 1 320px}
code{background:#f6f8fa;padding:2px 4px;border-radius:4px}
</style>
</head>
<body>
<div class="wrap">
<h1>PHP Mini Auditor</h1>
<p class="muted">Quickly inspect key SEO tags of any webpage (no external PHP libraries). Enter a URL or use the default.</p>
<form method="get" class="card">
<label for="url"><strong>URL</strong></label>
<input type="text" id="url" name="url" value="<?php echo htmlspecialchars($targetUrl); ?>">
<div style="margin-top:8px"><button type="submit">Audit</button></div>
</form>
<div class="card">
<h3>Summary</h3>
<div class="row">
<div class="col"><strong>URL:</strong> <?php echo htmlspecialchars($targetUrl); ?></div>
<div class="col"><strong>HTTP Status:</strong> <?php echo (int)$status; ?></div>
<div class="col"><strong>Links:</strong> internal=<?php echo (int)$internal; ?>, external=<?php echo (int)$external; ?></div>
</div>
</div>
<div class="card">
<h3>Meta & Head Tags</h3>
<table>
<tr><th>Title</th><td><?php echo htmlspecialchars($title ?: "—"); ?></td></tr>
<tr><th>Meta Description</th><td><?php echo htmlspecialchars($metaDescription ?: "—"); ?></td></tr>
<tr><th>Meta Keywords</th><td><?php echo htmlspecialchars($metaKeywords ?: "—"); ?></td></tr>
<tr><th>Viewport</th><td><?php echo htmlspecialchars($metaViewport ?: "—"); ?></td></tr>
<tr><th>Canonical</th><td><?php echo htmlspecialchars($canonical ?: "—"); ?></td></tr>
</table>
</div>
<div class="card">
<h3>Headings</h3>
<?php if (count($h1s)): ?>
<ul>
<?php foreach ($h1s as $h): ?>
<li><?php echo htmlspecialchars($h); ?></li>
<?php endforeach; ?>
</ul>
<?php else: ?>
<p class="muted">No <h1> found.</p>
<?php endif; ?>
</div>
<div class="card">
<h3>Social & Analytics</h3>
<table>
<tr><th>Open Graph present</th><td><?php echo $ogPresent ? "<span class='ok'>Yes</span>" : "<span class='bad'>No</span>"; ?></td></tr>
<tr><th>Twitter/X tags present</th><td><?php echo $twitterPresent ? "<span class='ok'>Yes</span>" : "<span class='warn'>No</span>"; ?></td></tr>
<tr><th>GA4 IDs</th><td><?php echo count($ga4Ids) ? htmlspecialchars(implode(", ", $ga4Ids)) : "<span class='warn'>None found</span>"; ?></td></tr>
</table>
</div>
<p class="muted">Tip: From CLI, run <code>php mini_auditor.php "https://example.com"</code> to print a compact text report.</p>
</div>
</body>
</html>
Yes. This script only uses cURL and outputs results directly in the browser or CLI. No database is required.
You can create a fixed demo version where the URL is hard-coded inside the script instead of being user-supplied.
Yes. The script checks PHP_SAPI
to detect whether it is run from CLI or a web server, and formats the output accordingly.
Absolutely. You can add more regex or DOM parsing logic with PHP functions like DOMDocument
or preg_match
to fetch tags such as Open Graph, canonical, or schema markup.
Yes. Since the script relies on cURL to fetch external pages, cURL must be enabled in your PHP installation. You can confirm by checking phpinfo()
.
php.ini
by uncommenting extension=curl
.curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
in testing, but always configure proper certificates in production.CURL_TIMEOUT
or check server firewall if external requests are blocked.robots
meta, hreflang, noindex
flags),
or save results into SQLite/MySQL and export to Excel—just like our Python auditor series.
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.