# AI Smart Ventures — Site Health & Storage Audit

Hi team,

We ran a full storage and performance audit of aismartventures.com this week. Here's what we found, what we cleaned up, and a few decisions we'd like to align with you on.

---

## 1. Storage cleanup — done

The site had grown to **36 GB** of disk usage. We've taken it down to **~17 GB** with no visible impact to the site. Here's the breakdown:

| Cleanup | Saved |
|---|---|
| Trimmed UpdraftPlus backup retention (was keeping 82 daily backups going back 6 months) | ~13 GB |
| Removed 12 unused video files | 0.4 GB |
| Removed 1,489 unused image attachments not referenced anywhere on the site | ~4.5 GB |
| Optimised 159 oversized JPEGs (avg 89% smaller, same dimensions, no visible quality loss) | 0.4 GB |
| Optimised 736 oversized PNGs (avg 80% smaller, capped at 1920px wide) | 1.9 GB |
| **Total reclaimed** | **~20 GB (55% of total)** |

Every deleted file was backed up first, and we verified the live site by crawling all 498 pages — **zero broken images caused by the cleanup**. 

We also reset the WordPress favicon (it had been pointing to a deleted attachment) and installed a static `/favicon.ico` at the site root to stop ~86 stray 404s per week.

---

## 2. Pre-existing broken assets we found (for your team to fix)

These were already broken before our work — referenced from real published posts but the files were missing:

| Post | Broken assets |
|---|---|
| `/posts/ai-makes-new-antibiotics-and-googles-job-creation-vision/` | 4 missing JPEGs (UUID filenames in 2025/08/) |
| `/posts/claude-3-5-pricing-googles-jarvis-you-coms-election-agent...` | `processed2xspeed_apartment_laundry_10min.mp4` |
| `/posts/level-up-your-career-with-the-applied-ai-course/` | `AISM-Reels.mp4` |

These look like they were lost during a past site restore. Easy fix: edit each post and either re-upload the asset or remove the embed.

---

## 3. Recommended 301 redirects for old/dead URLs

External links and bots keep hitting these URLs that no longer exist (these are NOT in your sitemap, but get repeated 404s in the access logs). We'd recommend adding these in the **Redirection** plugin (already installed) — they take 30 seconds to set up:

| Old URL | Redirect to | Hits/week |
|---|---|---|
| `/who-we-are` | `/about/` | 17 |
| `/our-story` | `/about/` | 17 |
| `/profile` | `/about/` | 17 |
| `/company-profile` | `/about/` | 18 |
| `/company` | `/about/` | 17 |
| `/speaker-page` | `/speaking/` | 24 |
| `/posts/blog/` | `/posts/` | 22 |
| `/category/blog/` | `/posts/` | 19 |
| `/posts/tag/make/` | `/posts/` | 18 |

There are also two dead URLs with emoji slugs (`%F0%9F%92%A1...rank-higher-in-2025...` and `%F0%9F%8C%8D...education-revolution...`) that look like deleted blog posts. We can either redirect to `/posts/` or just let them 404 — your call. Same for `/posts/making-it-big-on-myspace-the-arctic-monkeys-story/`, which looks like nothing your team ever published.

---

## 4. Security: active probing and brute-force attempts

Cloudflare's Super Bot Fight Mode is active and doing some work, but we found **two specific bad actors that should be added to your Cloudflare WAF block list**:

1. **`104.238.222.26`** — made 1,783 requests against `/wp-admin/` and `/wp-login.php` over the week. Classic credential-stuffing probe. No actual logins succeeded, but it was hitting the origin hard enough that WordPress threw 591 × 503 (overload) responses back at it.
2. **`185.177.72.56`** — made 1,135 requests probing for the PHPUnit RCE vulnerability and JBoss exploits. None of these exploits exist on your site (all returned 404), but it's wasting bandwidth.

Five separate IPs over the week probed for the `/.env` file (where credentials sometimes leak). All returned 404 — **your `.env` is properly protected** — but worth a generic WAF rule to block any `/.env` request outright.

---

## 5. Bot decisions we'd like your input on

Your site sees a LOT of bot traffic. Knowing your business (AI consultancy that wants discoverability in AI tools), here's our recommendation per category:

**Definitely keep (allow + welcome):**
- Googlebot, Bingbot, Applebot — major search engines
- ChatGPT-User, PerplexityBot, OAI-SearchBot — AI assistants. Currently your #2 traffic source after WordPress itself; you want this.
- SemrushBot — you're a Semrush customer, this is your own tool

**Recommend blocking (low business value, high bandwidth cost):**
- **AhrefsBot** (864/wk) — competitor SEO data tool. You already pay for Semrush which gives you the same data.
- **MJ12bot** (610/wk) — Majestic SEO crawl, similar story.
- **Barkrowler** (395/wk) — Babbar (French SEO tool), no clear value.
- **PetalBot** (1,402/wk) — Huawei search; ~0% of your traffic comes from Huawei users.
- **ClueWeb-Crawler** (859/wk) — academic research dataset. No commercial value.

**Up to you (judgement call):**
- **Bytespider + TikTokSpider** (~1,273/wk) — ByteDance's training crawler for their own AI models. Your call whether you want ByteDance's AI products to know about you.

If you give us the green light, we can add the blocks via `robots.txt` (polite but voluntary) and/or Cloudflare WAF rules (enforced).

---

## 6. Practices going forward — for the content team

A few patterns we noticed that, if changed at the source, will keep the site lean automatically. None of these are urgent — just easy upgrades to your team's workflow:

1. **Don't upload Gemini AI images as PNG.** Gemini's "Download" button gives you a 5–10 MB lossless PNG. **Save as JPEG quality 85 before uploading** — same image, ~90% smaller. (We found 2,856 of these on the site totalling 5.85 GB before optimisation.)
2. **Reuse images from the media library instead of re-uploading.** We found 1,433 files with `-1`, `-2`, `-3` suffixes — that's the team uploading the same image multiple times rather than finding it in the library. Worth a quick training note.
3. **Don't host videos in WordPress.** Use YouTube or Vimeo and embed. We found a 263 MB `.mov` file uploaded once and never used — that's an Apple ProRes file that should never have been on a web server.
4. **Image filenames matter.** Generic names like `image-22.png`, `Gemini_Generated_Image_haoforhaoforhaof.png`, `1080p.mp4`, `4.2.mp4` make it impossible to find anything later, which leads back to re-uploading. A 5-second rename at upload time saves hours later.
5. **Page title length.** Screaming Frog flagged 1,494 of your pages (62%) with titles over 60 characters, including the homepage. Search engines truncate at ~60 chars, so the back half of those titles isn't visible in search results. Worth a content pass.

---

## What we'd like to do next

Pending your approval on the bot list, the immediate next steps are:

1. ✅ **Already done**: 20 GB cleanup, image optimisation, favicon restore
2. ⏳ **Pending your input**: which bots to block (see section 5)
3. ⏳ **Pending your input**: 301 redirects for the dead URLs (see section 3)
4. **Recommended ongoing**: install ShortPixel or Imagify on WordPress to auto-optimise new uploads and serve WebP — this turns the cleanup we just did into a permanent improvement

Happy to walk through any of this on a call.

Thanks,
{your name}
