'Searching serialized data, using active record

I'm trying to do a simple query of a serialized column, how do you do this?

serialize :mycode, Array


1.9.3p125 :026 > MyModel.find(104).mycode
  MyModel Load (0.6ms)  SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`id` = 104 LIMIT 1
 => [43565, 43402] 
1.9.3p125 :027 > MyModel.find_all_by_mycode("[43402]")
  MyModel Load (0.7ms)  SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = '[43402]'
 => [] 
1.9.3p125 :028 > MyModel.find_all_by_mycode(43402)
  MyModel Load (1.2ms)  SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = 43402
 => [] 
1.9.3p125 :029 > MyModel.find_all_by_mycode([43565, 43402])
  MyModel Load (1.1ms)  SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` IN (43565, 43402)
 => [] 


Solution 1:[1]

Basically, you can't. The downside of #serialize is that you're bypassing your database's native abstractions. You're pretty much limited to loading and saving the data.

That said, one very good way to slow your application to a crawl could be:

MyModel.all.select { |m| m.mycode.include? 43402 }

Moral of the story: don't use #serialize for any data you need to query on.

Solution 2:[2]

It's just a trick to not slow your application. You have to use .to_yaml.

exact result:

MyModel.where("mycode = ?", [43565, 43402].to_yaml)
#=> [#<MyModel id:...]

Tested only for MySQL.

Solution 3:[3]

Serialized array is stored in database in particular fashion eg:

[1, 2, 3, 4]
in
1\n 2\n 3\n etc

hence the query would be

MyModel.where("mycode like ?", "% 2\n%")

put space between % and 2.

Solution 4:[4]

Noodl's answer is right, but not entirely correct.

It really depends on the database/ORM adapter you are using: for instance PostgreSQL can now store and search hashes/json - check out hstore. I remember reading that ActiveRecord adapter for PostgreSQl now handles it properly. And if you are using mongoid or something like that - then you are using unstructured data (i.e. json) on a database level everywhere.

However if you are using a db that can't really handle hashes - like MySQL / ActiveRecord combination - then the only reason you would use serialized field is for somet data that you can create / write in some background process and display / output on demand - the only two uses that I found in my experience are some reports ( like a stat field on a Product model - where I need to store some averages and medians for a product), and user options ( like their preferred template color -I really don't need to query on that) - however user information - like their subscription for a mailing list - needs to be searchable for email blasts.

PostgreSQL hstore ActiveRecord Example:

MyModel.where("mycode @> 'KEY=>\"#{VALUE}\"'")

UPDATE As of 2017 both MariaDB and MySQL support JSON field types.

Solution 5:[5]

You can query the serialized column with a sql LIKE statement.

 MyModel.where("mycode LIKE '%?%'", 43402)

This is quicker than using include?, however, you cannot use an array as the parameter.

Solution 6:[6]

Good news! If you're using PostgreSQL with hstore (which is super easy with Rails 4), you can now totally search serialized data. This is a handy guide, and here's the syntax documentation from PG.

In my case I have a dictionary stored as a hash in an hstore column called amenities. I want to check for a couple queried amenities that have a value of 1 in the hash, I just do

House.where("amenities @> 'wifi => 1' AND amenities @> 'pool => 1'")

Hooray for improvements!

Solution 7:[7]

There's a blog post from 2009 from FriendFeed that describes how to use serialized data within MySQL.

What you can do is create tables that function as indexes for any data that you want to search.

Create a model that contains the searchable values/fields

In your example, the models would look something like this:

class MyModel < ApplicationRecord
  # id, name, other fields...
  serialize :mycode, Array
end

class Item < ApplicationRecord
  # id, value...
  belongs_to :my_model
end

Creating an "index" table for searchable fields

When you save MyModel, you can do something like this to create the index:

Item.where(my_model: self).destroy
self.mycode.each do |mycode_item|
  Item.create(my_model: self, value: mycode_item)
end

Querying and Searching

Then when you want to query and search just do:

Item.where(value: [43565, 43402]).all.map(&:my_model)
Item.where(value: 43402).all.map(&:my_model)

You can add a method to MyModel to make that simpler:

def find_by_mycode(value_or_values)
  Item.where(value: value_or_values).all.map(&my_model)
end

MyModel.find_by_mycode([43565, 43402])
MyModel.find_by_mycode(43402)

To speed things up, you will want to create a SQL index for that table.

Solution 8:[8]

Using the following comments in this post

https://stackoverflow.com/a/14555151/936494

https://stackoverflow.com/a/15287674/936494

I was successfully able to query a serialized Hash in my model

class Model < ApplicationRecord
  serialize :column_name, Hash
end

When column_name holds a Hash like

{ my_data: [ { data_type: 'MyType', data_id: 113 } ] }

we can query it in following manner

Model.where("column_name = ?", hash.to_yaml)

That generates a SQL query like

Model Load (0.3ms)  SELECT "models".* FROM "models" WHERE (column_name = '---
:my_data:
- :data_type: MyType
  :data_id: 113
')

In case anybody is interested in executing the generated query in SQL terminal it should work, however care should be taken that value is in exact format stored in DB. However there is another easy way I found at PostgreSQL newline character to use a raw string containing newline characters

select * from table_name where column_name = E'---\n:my_data:\n- :data_type: MyType\n  :data_id: 113\n'

The most important part in above query is E.

Note: The database on which I executed above is PostgreSQL.

Solution 9:[9]

To search serialized list you need to prefix and postfix the data with unique characters.

Example:

Rather than something like:

2345,12345,1234567 which would cause issues you tried to search for 2345 instead, you do something like <2345>,<12345>,<1234567> and search for <2345> (the search query get's transformed) instead. Of course choice of prefix/postfix characters depends on the valid data that will be stored. You might instead use something like ||| if you expect < to be used and potentially| to be used. Of course that increases the data the field uses and could cause performance issues.

Using a trigrams index or something would avoid potential performance issues.

You can serialize it like data.map { |d| "<#{d}>" }.join(',') and deserialize it via data.gsub('<').gsub('>','').split(','). A serializer class would do the job quite well to load/extract tha data.

The way you do this is by setting the database field to text and using rail's serialize model method with a custom lib class. The lib class needs to implement two methods:

def self.dump(obj) # (returns string to be saved to database) def self.load(text) # (returns object)

Example with duration. Extracted from the article so link rot wouldn't get it, please visit the article for more information. The example uses a single value, but it's fairly straightforward to serialize a list of values and deserialize the list using the methods mentioned above.

class Duration
  # Used for `serialize` method in ActiveRecord
  class << self
    def load(duration)
      self.new(duration || 0)
    end

    def dump(obj)
      unless obj.is_a?(self)
        raise ::ActiveRecord::SerializationTypeMismatch,
          "Attribute was supposed to be a #{self}, but was a #{obj.class}. -- #{obj.inspect}"
      end

      obj.length
    end
  end


  attr_accessor :minutes, :seconds

  def initialize(duration)
    @minutes = duration / 60
    @seconds = duration % 60
  end

  def length
    (minutes.to_i * 60) + seconds.to_i
  end
end

Solution 10:[10]

If you have serialized json column and you want to apply like query on that. do it like that

YourModel.where("hashcolumn like ?", "%#{search}%")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 noodl
Solution 2
Solution 3 KBart
Solution 4
Solution 5 Kyle C
Solution 6 NealJMD
Solution 7
Solution 8 Jignesh Gohel
Solution 9
Solution 10 shmee