www.bortolotto.eu

Newsfeeds
Planet MySQL
Planet MySQL - https://planet.mysql.com

  • MySQL 8.0 JSON Functions: Practical Examples and Indexing
    This post covers a hands-on walkthrough of MySQL 8.0's JSON functions. JSON support has been in MySQL since 5.7, but 8.0 added a meaningful set of improvements — better indexing strategies, new functions, and multi-valued indexes — that make working with JSON data considerably more practical. The following documents several of the most commonly needed patterns, including EXPLAIN output and performance observations worth knowing about. This isn't a "JSON vs. relational" debate post. If you're storing JSON in MySQL, you probably already have your reasons. The goal here is to make sure you're using the available tooling effectively. Environment mysql> SELECT @@version, @@version_comment\G *************************** 1. row *************************** @@version: 8.0.36 @@version_comment: MySQL Community Server - GPL Testing was done on a VM with 8GB RAM and innodb_buffer_pool_size set to 4G. One housekeeping note worth mentioning: query_cache_type is irrelevant in 8.0 since the query cache was removed entirely. If you migrated a 5.7 instance and still have that variable in your my.cnf, remove it — MySQL 8.0 will throw a startup error. Setting Up a Test Table The test table simulates a fairly common pattern — an application storing user profile data and event metadata as JSON blobs: CREATE TABLE user_events ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, user_id INT UNSIGNED NOT NULL, event_data JSON NOT NULL, created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), INDEX idx_user (user_id) ) ENGINE=InnoDB; INSERT INTO user_events (user_id, event_data) VALUES (1, '{"action":"login","ip":"192.168.1.10","tags":["mobile","vpn"],"score":88}'), (1, '{"action":"purchase","ip":"192.168.1.10","tags":["desktop"],"score":72,"amount":49.99}'), (2, '{"action":"login","ip":"10.0.0.5","tags":["mobile"],"score":91}'), (3, '{"action":"logout","ip":"10.0.0.9","tags":["desktop","vpn"],"score":65}'), (2, '{"action":"purchase","ip":"10.0.0.5","tags":["mobile"],"score":84,"amount":129.00}'); Basic Extraction: JSON_VALUE vs. JSON_EXTRACT JSON_VALUE() was introduced in MySQL 8.0.21 and is the cleaner way to extract scalar values with built-in type casting. Before that, you were using JSON_EXTRACT() (or the -> shorthand) and casting manually, which works but adds noise to your queries. -- Pre-8.0.21 approach SELECT user_id, JSON_EXTRACT(event_data, '$.action') AS action, CAST(JSON_EXTRACT(event_data, '$.score') AS UNSIGNED) AS score FROM user_events; -- Cleaner 8.0.21+ approach SELECT user_id, JSON_VALUE(event_data, '$.action') AS action, JSON_VALUE(event_data, '$.score' RETURNING UNSIGNED) AS score FROM user_events; Output from the second query: +---------+----------+-------+ | user_id | action | score | +---------+----------+-------+ | 1 | login | 88 | | 1 | purchase | 72 | | 2 | login | 91 | | 3 | logout | 65 | | 2 | purchase | 84 | +---------+----------+-------+ 5 rows in set (0.00 sec) The RETURNING clause is genuinely useful. It eliminates the awkward double-cast pattern and makes intent clearer when reading query code later. Multi-Valued Indexes: The Real Game Changer This is where 8.0 actually moved the needle for JSON workloads. Multi-valued indexes, available since MySQL 8.0.17, let you index array elements inside a JSON column directly. Here's what that looks like in practice: ALTER TABLE user_events ADD INDEX idx_tags ((CAST(event_data->'$.tags' AS CHAR(64) ARRAY))); Here is what EXPLAIN shows before and after on a query filtering by tag value: -- Without the multi-valued index: EXPLAIN SELECT * FROM user_events WHERE JSON_CONTAINS(event_data->'$.tags', '"vpn"')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: user_events partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 Extra: Using where -- After adding the multi-valued index: EXPLAIN SELECT * FROM user_events WHERE JSON_CONTAINS(event_data->'$.tags', '"vpn"')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: user_events partitions: NULL type: range possible_keys: idx_tags key: idx_tags key_len: 67 ref: NULL rows: 2 filtered: 100.00 Extra: Using where Full table scan down to a range scan. On 5 rows this is trivial, but on a table with millions of rows and frequent tag-based filtering, that difference is significant. The improvement scales directly with table size and query frequency. One important gotcha: MEMBER OF() and JSON_OVERLAPS() also benefit from multi-valued indexes, but JSON_SEARCH() does not. This matters when choosing your query pattern at design time: -- This WILL use the multi-valued index: SELECT * FROM user_events WHERE 'vpn' MEMBER OF (event_data->'$.tags'); -- This will NOT use it: SELECT * FROM user_events WHERE JSON_SEARCH(event_data->'$.tags', 'one', 'vpn') IS NOT NULL; Aggregating and Transforming JSON A few aggregation functions worth knowing well: -- Build a JSON array of actions per user SELECT user_id, JSON_ARRAYAGG(JSON_VALUE(event_data, '$.action')) AS actions FROM user_events GROUP BY user_id; +---------+----------------------+ | user_id | actions | +---------+----------------------+ | 1 | ["login","purchase"] | | 2 | ["login","purchase"] | | 3 | ["logout"] | +---------+----------------------+ 3 rows in set (0.01 sec) -- Summarize into a JSON object keyed by action SELECT user_id, JSON_OBJECTAGG( JSON_VALUE(event_data, '$.action'), JSON_VALUE(event_data, '$.score' RETURNING UNSIGNED) ) AS score_by_action FROM user_events GROUP BY user_id; +---------+--------------------------------+ | user_id | score_by_action | +---------+--------------------------------+ | 1 | {"login": 88, "purchase": 72} | | 2 | {"login": 91, "purchase": 84} | | 3 | {"logout": 65} | +---------+--------------------------------+ 3 rows in set (0.00 sec) JSON_OBJECTAGG() will throw an error if there are duplicate keys within a group. This is worth knowing before you encounter it in a production ETL pipeline. In that case, you'll need to deduplicate upstream or handle it in application logic before the data reaches this aggregation step. Checking SHOW STATUS After JSON-Heavy Queries When evaluating query patterns, checking handler metrics is a useful habit: FLUSH STATUS; SELECT * FROM user_events WHERE JSON_VALUE(event_data, '$.score' RETURNING UNSIGNED) > 80; SHOW STATUS LIKE 'Handler_read%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_read_first | 1 | | Handler_read_key | 0 | | Handler_read_last | 0 | | Handler_read_next | 4 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 6 | +----------------------------+-------+ 7 rows in set (0.00 sec) The Handler_read_rnd_next value confirms a full scan — no surprise since there's no functional index on the score value. For score-based filtering at scale, a generated column with an index is the right answer: ALTER TABLE user_events ADD COLUMN score_val TINYINT UNSIGNED GENERATED ALWAYS AS (JSON_VALUE(event_data, '$.score' RETURNING UNSIGNED)) VIRTUAL, ADD INDEX idx_score (score_val); After adding that, the same query drops to a proper index range scan. Generated columns on JSON fields are available in both MySQL 8.0 and Percona Server 8.0, and they remain the most reliable path for scalar JSON field filtering at any meaningful scale. If you're running Percona Server, pt-query-digest from the Percona Toolkit is still the most practical way to identify which JSON-heavy queries are actually causing pain in production before you start adding indexes speculatively. Practical Observations Multi-valued indexes (8.0.17+) are a long overdue improvement and work well when your query patterns align with JSON_CONTAINS() or MEMBER OF() JSON_VALUE() with RETURNING (8.0.21+) is cleaner than the old cast-after-extract pattern and worth adopting consistently Generated columns plus indexes remain the most reliable path for scalar JSON field filtering at scale Watch for JSON_OBJECTAGG() duplicate key errors in grouped data — it surfaces as a hard error in ETL pipelines and can be easy to miss in testing if your sample data happens to be clean Always verify index usage with EXPLAIN — the optimizer doesn't always pick up multi-valued indexes in complex WHERE clauses, and it's worth confirming rather than assuming Summary MySQL 8.0's JSON improvements are genuinely useful, particularly multi-valued indexes and JSON_VALUE() with type casting. They don't replace good schema design, but for cases where JSON storage is appropriate or inherited, you now have real tools to work with rather than just hoping the optimizer figures it out. The generated column pattern in particular is worth evaluating early if you know certain JSON fields will be used in WHERE clauses regularly. Useful references: MySQL 8.0 JSON Function Reference Multi-Valued Indexes Documentation JSON_VALUE() Function Reference Percona Toolkit

  • An Open Letter to Oracle: Let’s Talk About MySQL’s Future
    What Happened at the Summits We just wrapped up two MySQL Community Summits – one in San Francisco in January, and one in Brussels right before FOSDEM. The energy in the rooms: a lot of people who care deeply about MySQL got together, exchanged ideas, and left with a clear sense that we need to […]

  • New Era of MySQL Community Engagement
    As we mark the milestone of MySQL’s 30-year anniversary, the celebrations around the globe have been a testament to the widespread impact and popularity of the Dolphin. This week’s annual preFOSDEM MySQL Belgian Days in Brussels are not only an opportunity to celebrate, but also a chance for Oracle to share some key updates on […]

  • Wireshark now can decode MySQL X Protocol
    The new protocol dissector for X Protocol in MySQL was just merged to the master branch in Wireshark. To get it build Wireshark from the master branch or wait for the next release.This protocol is using Google Protobuf, which makes it much easier to work with than the regular MySQL protocol.See also: https://dev.mysql.com/doc/dev/mysql-server/latest/page_mysqlx_protocol.html If you like what Wireshark does, consider donating on https://wiresharkfoundation.org/donate/   

  • Bringing GenAI to Every MySQL Instance: ProxySQL v4.0
    The Problem with “Just Migrate” Most organizations are sitting on MySQL deployments they can’t easily change — a mix of community editions, managed cloud services, and legacy versions. Teams want RAG pipelines and natural language querying, but adding AI capabilities typically means schema migrations, new vector database infrastructure, dual-write synchronization headaches, and AI logic sprawled across every application layer. The operational cost is real, and the governance risk is worse. ProxySQL v4.0 takes a different approach: don’t touch your database at all. The Transparent AI Layer The core thesis is elegant — put the intelligence at the proxy layer, not in the database or the application. ProxySQL already sits between every client and every MySQL backend, which makes it a natural choke point for centralized governance, auth, auditing, and now AI capabilities. No connection string changes, no schema migrations, no new infrastructure your DBA team has to babysit. The comparison with app-layer integration is stark. Where app-layer AI means fragmented governance across every service, schema changes, and multiple network hops, the ProxySQL AI layer provides unified query rules, zero schema changes, and a single enforced access path. What’s Actually in v4.0 The MCP (Model Context Protocol) server is the centerpiece. Running on port 6071 over HTTPS with bearer token auth, it exposes 30+ tools that any MCP-compatible agent — Claude Code, GitHub Copilot, Cursor, Warp — can discover and call via standard JSON-RPC. Tools span schema discovery (list_schemas, list_tables, list_columns), safe read-only execution (run_sql_readonly, explain_sql), and full RAG search capabilities (rag.search_fts, rag.search_vector, rag.search_hybrid). All MCP requests pass through MCP Query Rules — analogous to ProxySQL’s existing mysql_query_rules — which can allow, block, rewrite, or timeout requests before they ever reach MySQL. This is where you enforce read-only access, prevent data exfiltration, and audit everything agents are doing. The Autodiscovery system (discovery.run_static) runs a two-phase process: static schema harvesting followed by LLM-enriched metadata including summaries and domain analysis. Everything lands in a local SQLite catalog (mcp_catalog.db) that agents can then search semantically via llm.search. The NL2SQL workflow builds on this: agents search the catalog for relevant schemas, synthesize or reuse SQL templates, execute safely via run_sql_readonly, and optionally store successful query patterns as templates for future reuse — a continuous learning loop that improves accuracy over time. What’s Still Coming The presentation is upfront that this is a prototype showcase. Still on the roadmap: automatic embedding generation (with local or external model options), real-time indexing via MySQL replication/binlog without touching source tables, DISTANCE() SQL semantics for vector search on AI-blind MySQL backends, and additional MCP endpoints for config management, cache inspection, and observability. The Bottom Line The proxy layer argument is compelling: it’s operationally mature, protocol-aware, already deployed in front of critical databases, and has a battle-tested policy engine. Adding AI there rather than in application code means one place to enforce rules, one place to audit, and zero changes to the workloads that depend on MySQL stability. The Code is on the v4.0 branch at github.com/sysown/proxysql, and the Generative AI documentation section at sysown.github.io/proxysql covers the new features. Download the Bringing GenAI to every MySQL Instance Presentation given by René Cannaò at preFOSDEM MySQL Belgian Days 2026 in Brussels. The post Bringing GenAI to Every MySQL Instance: ProxySQL v4.0 appeared first on ProxySQL.