What's the best way to calculate similarity between rows in a table based on association?

Suppose each Person has a collection of favorite Books.

So I have a table for:

  • How can I attach a database?
  • Software Auto Update
  • Questions about local database vs service-based database
  • How do I drop a column with object dependencies in SQL Server 2008?
  • What would prevent code running in a Docker container from connecting to a database on a separate server?
  • SSIS XMLSource only seeing null values in XML variable
    • Person
    • Book
    • The association between Person and Book (joint table for MxN)

    I want to fetch the Persons that are similar to a Person1 based on the favorite Books overlaping. That is: The more books they have in common, the more they are similar.

    I don’t have to use only SQL to solve this problem. I could use programming also. I’m using SQL Server 2008 and C#.

    What solution would you experts use?

  • Visual Studio 2010 built-in fake/test data generator for SQL Server
  • Cheap/Best Dashboard Solution for Microsoft SQL Server Reporting Services and .Net
  • SQL generated by EntityFramework StartsWith() contains plan altering ESCAPE '~' (tilde)
  • SqlDataReader Index Out Of Range Exception when correct column count is returned
  • How do I get LinqToSql to pass “index hints” to sql server?
  • How to create SQL Server table schema from a XML schema? (with .NET and Visual Studio 2008)
  • 3 Solutions collect form web for “What's the best way to calculate similarity between rows in a table based on association?”

    This may not be the most efficient, but it’s relatively simple:

    WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
    (
     Select p1.person_id, p2.person_id AS simlar_person_id,   
     /* Find the number of books p1 and p2 have in common */
       (SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2 
         JOIN pb1=book_id=pb2.book_id
       WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
       FROM Person p1 CROSS JOIN Person p2
    )
    

    This will give you for each person, a list of other persons and the number books in common.

    To get the most similar person, add (in the same query)

    SELECT TOP 1 similar_person_id FROM SimilarBookPrefs 
       WHERE person_id = <person_to_match>
       ORDER By booksInCommon DESC;
    

    The first part does not have to be a CTE (i.e. WITH …) it can be a view or even a derived table. It’a a CTE here for brevity.

    If I were doing this in C#, I might tackle it like this

    var query = from personBook in personBooks
                where personBook.PersonId != basePersonId // ID of person to match
                join bookbase in personBooks
                on personBook.BookId equals bookbase.BookId
                where bookbase.PersonId == basePersonId // ID of person to match
                join person in persons 
                on personBook.PersonId equals person.Id 
                group person by person into bookgroup
                select new
                {
                    Person = bookgroup.Key, 
                    BooksInCommon = bookgroup.Count()
                };
    

    This could likely be done with the entity framework or Linq to SQL, or simply translated into SQL directly.

    Full sample code

    class CommonBooks
    {
        static void Main()
        {
            List<Person> persons = new List<Person>()
            {
                new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
            };
    
            List<Book> books = new List<Book>()
            {
                new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
            };
    
            List<PersonBook> personBooks = new List<PersonBook>()
            {
                new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5), 
                new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5), 
                new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5), 
                new PersonBook(4,1), new PersonBook(4,4),
                new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
            };
    
            int basePersonId = 4; // person to match likeness
    
            var query = from personBook in personBooks
                        where personBook.PersonId != basePersonId
                        join bookbase in personBooks
                        on personBook.BookId equals bookbase.BookId
                        where bookbase.PersonId == basePersonId
                        join person in persons
                        on personBook.PersonId equals person.Id
                        group person by person into bookgroup
                        select new
                        {
                            Person = bookgroup.Key,
                            BooksInCommon = bookgroup.Count()
                        };
    
            foreach (var item in query)
            {
                Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
            }
    
            Console.Read();
        }
    }
    
    class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public Person(int id, string name) { Id = id; Name = name; }
    }
    
    class Book
    {
        public int Id { get; set; }
        public Book(int id) { Id = id; }
    }
    
    class PersonBook
    {
        public int PersonId { get; set; }
        public int BookId { get; set; }
        public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
    }
    

    The problem you are describing is usually referred to as “collaborative filtering” and tackled using “recommender systems”. Googling for either of those terms should lead you to plenty of useful information.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.